pb.deployments.update
pb.deployments.update
Update an existing serverless deployment
Parameters:
deployment_ref: str
Name or UUID of the private serverless deployment
config: Update Deployment Config
Returns:
Deployment
Usage:
To update a deployment, call pb.deployments.update
, and for the config
parameter provide an instance of
UpdateDeploymentConfig
with only the fields you want to change set. Any fields not set will remain unchanged.
Example:
Assume we have an existing deployment named my-mistral-7b
.
pb.deployments.get("my-mistral-7b")
The hf_token field is not shown in the output of pb.deployments.get
, even if set, but it is persisted on the backend.
This might return the following configuration. This configuration specifies some default adapters and configures autoscaling between 1 and 2 replicas. Autoscaling is configured to scale down to 1 replica if traffic is low for at least 600 seconds, and scale up to two replicas if the single scaled-up replica has at least one pending request.
Deployment(
name="my-mistral-7b",
# <...>
config=UpdateDeploymentConfig(
custom_args=["--preloaded-adapter-ids", "my_adapter/1"], cooldown_time=600, hf_token=None, min_replicas=1, max_replicas=2, scale_up_threshold=1
),
)
Update the deployment configuration by providing something like the following to pb.deployments.update
. This
configuration specifies an additional preloaded adapter (note that we must include the existing adapter in the new
list). It also changes autoscaling behavior: the deployment can now scale down to no replicas when there is no traffic,
and we scale down the number of replicas after 1200 seconds of low traffic (vs 600 seconds previously).
pb.deployments.update(
deployment_ref="my-mistral-7b",
config=UpdateDeploymentConfig(
cooldown_time=1200, # Changed from 600
custom_args=["--preloaded-adapter-ids", "my_adapter/1", "--preloaded-adapter-ids", "my_other_adapter/1"], # Added a second adapter
min_replicas=0, # Changed from 1
)
)
Now pb.deployments.get("my-mistral-7b")
will return:
Deployment(
name="my-mistral-7b",
# <...>
config=UpdateDeploymentConfig(
custom_args=[], cooldown_time=1200, hf_token=None, min_replicas=0, max_replicas=2, scale_up_threshold=1
),
)
Note that max_replicase
remains at the non-default value of 2, even though it was not explicitly set in the call to
pb.deployments.update
.
- Updating a deployment will not cause any downtime. The existing deployment will continue to serve requests while the new configuration is applied.
- A very large number of (advanced) configuration parameters are configured by the
custom_args
field. See the custom_args section of DeploymentConfig for more information. - To update Lorax to the latest supported version, specify
lorax_image_tag="<current>"
in theconfig
parameter. - Not all lorax CLI arguments are supported. Passing a non-supported argument will result in an error.
- The SDK and backend do not validate that the values of custom_args are valid lorax parameters. Passing an invalid
value will result in Lorax failing to start the deployment. (However the existing deployment will continue to serve.)
custom_args
is intended as a break-glass feature for advanced users who need to pass additional parameters to Lorax. - If you provide
custom_args
in theconfig
parameter, it will replace the existingcustom_args
list. If you want to add to the existing list, you must include all the existing values in the new list.