UpdateDeploymentConfig
Below is the class definition for the Update Deployment Config. It inherits from the BaseModel defined in Pydantic.
class UpdateDeploymentConfig(BaseModel):
base_model: str | None = Field(default=None)
cooldown_time: PositiveInt | None = Field(default=None)
custom_args: list[str] | None = Field(default=None)
hf_token: str | None = Field(default=None)
min_replicas: NonNegativeInt | None = Field(default=None)
max_replicas: PositiveInt | None = Field(default=None)
scale_up_threshold: PositiveInt | None = Field(default=None)
max_total_tokens: int | None = Field(default=None)
lorax_image_tag: str | None = Field(default=None)
request_logging_enabled: bool | None = Field(default=None)
direct_ingress: bool | None = Field(default=None)
preloaded_adapters: list[str] | None = Field(default=None)
speculator: str | None = Field(default=None)
prefix_caching: bool | None = Field(default=None)
disable_adapters: bool | None = Field(default=None)
The meaning of these fields is identical to those in DeploymentConfig.
Note that there are a very large number of (advanced) configuration parameters that are configured by the custom_args
field. See the custom_args section of DeploymentConfig for more information.
The default value of all of these fields is None
. When updating a deployment, any fields set to None
will be left at
their currently-deployed value. Non-None
fields will be updated to the value specified in the field.
See Update Deployment for more information.
NOTES
- To update the lorax version to the latest supported version, set
lorax_image_tag
to<current>
. - When updating to a new
custom_args
value, the full value of the newcustom_args
field must be provided. E.g. if the existing value is ['foo'] and you want it to be ['foo', 'bar'], you must provide ['foo', 'bar'] as the new value. - A deployment's base model can be updated via the SDK. Note that a base model with significantly different properties (e.g. model size) may not be servable with the same deployment configuration, so only update with a base model that is similar to the existing one. (If the new model is not servable with the existing configuration, the deployment will continue to serve with the previous version.)