Skip to main content

DeploymentConfig

Below is the class definition for the Deployment Config. It inherits from the BaseModel defined in Pydantic.

class DeploymentConfig(BaseModel):
base_model: str
accelerator: str | None = Field(default=None)
cooldown_time: PositiveInt | None = Field(default=43200)
hf_token: str | None = Field(default=None)
min_replicas: int | None = Field(default=None)
max_replicas: int | None = Field(default=None)
scale_up_threshold: int | None = Field(default=None)

By default, Predibase sets the following default values (subject to change):

  • accelerator: Predibase chooses the best suitable option given your tier, provided base_model, and availability.
  • cooldown_time: 43200 seconds (12 hours)
  • min_replicas: 0
  • max_replicas: 1 (Maximum of 3 for free/dev tier and 6 for enterprise)
  • scale_up_threshold: 1

To configure autoscaling to 0, set min_replicas to 0. To configure autoscaling past 1 replica, change max_replicas to the desired value and modify scale_up_threshold. scale_up_threshold represents the number of simultaneous requests a single LLM replica will handle before scaling up any additional replicas. The value will be highly dependent on your use case and we suggest experimentation to find an optimal number to meet your throughput needs.

If deploying a "best effort" custom base model,

  • You must provide an accelerator
  • You must provide hf_token (your Huggingface token) if deploying a private model.
Note regarding base_model

Use the short names provided in the list of available models.