DeploymentConfig
Configuration options for model deployments
The DeploymentConfig
class defines the parameters used for creating private
serverless deployments. This configuration is used to specify model, resource
allocation, and other deployment options.
Parameters
Parameter | Type | Required | Default | Description |
---|---|---|---|---|
base_model | string | Yes | The base model to deploy. Can be a short name from our models list or Hugging Face model path | |
accelerator | string | No | Type of accelerator to use. If not specified, Predibase chooses the best suitable option | |
cooldown_time | integer | No | 3600 | Time in seconds before scaling down idle replicas |
hf_token | string | No | Hugging Face token for private models | |
min_replicas | integer | No | 0 | Minimum number of replicas |
max_replicas | integer | No | 1 | Maximum number of replicas |
scale_up_threshold | integer | No | 1 | Number of simultaneous requests before scaling up |
quantization | string | No | Quantization method (none , fp8 , bitsandbytes-nf4 ). Default is based on the model and accelerator. | |
uses_guaranteed_capacity | boolean | No | false | Whether to use guaranteed capacity |
max_total_tokens | integer | No | Maximum number of tokens per request | |
lorax_image_tag | string | No | Tag for the LoRAX image | |
request_logging_enabled | boolean | No | false | Whether to enable request logging |
direct_ingress | boolean | No | false | Creates a direct endpoint to the LLM, bypassing the Predibase control plane |
preloaded_adapters | array[string] | No | List of adapter IDs to preload on deployment initialization | |
speculator | string | No | Speculator to use for the deployment (auto , disabled , or adapter ID of a Turbo or Turbo LoRA) | |
prefix_caching | boolean | No | false | Whether to enable prefix caching |
custom_args | array[string] | No | Custom arguments to pass to the LoRAX launcher |
Example Usage
Additional Pointers
Supported models and revisions
- For
base_model
, use the short names provided in the list of available models or you can provide the path to a Hugging Face model (e.g. “meta-llama/Meta-Llama-3-8B”). - You can optionally specify a Hugging Face revision for the base model by specifying the
base_model
param in the formatmodel@revision
.
Speculative decoding
- By default, all Predibase deployments of supported models will leverage speculative decoding to improve model performance by default.
- When
speculator
is set toauto
, the deployment will use a pre-configured speculator based on its base model or, if not available, one of the preloaded Turbo or Turbo LoRA adapters provided by the user. - Conversely, when
speculator
isdisabled
, no Turbo LoRA or Turbo adapters may be preloaded (i.e. may not be specified inpreloadedAdapters
).
- When