Skip to main content

pb.deployments.update

pb.deployments.update

Update an existing serverless deployment

Parameters:

   deployment_ref: str
Name or UUID of the private serverless deployment

   config: Update Deployment Config


Returns:

   Deployment

Usage:

To update a deployment, call pb.deployments.update, and for the config parameter provide an instance of UpdateDeploymentConfig with only the fields you want to change set. Any fields not set will remain unchanged.

Example:

Assume we have an existing deployment named my-mistral-7b.

pb.deployments.get("my-mistral-7b")
NOTE

The hf_token field is not shown in the output of pb.deployments.get, even if set, but it is persisted on the backend.

This might return the following configuration. This configuration specifies some default adapters and configures autoscaling between 1 and 2 replicas. Autoscaling is configured to scale down to 1 replica if traffic is low for at least 600 seconds, and scale up to two replicas if the single scaled-up replica has at least one pending request.

Deployment(
name="my-mistral-7b",
# <...>
config=UpdateDeploymentConfig(
custom_args=["--preloaded-adapter-ids", "my_adapter/1"], cooldown_time=600, hf_token=None, min_replicas=1, max_replicas=2, scale_up_threshold=1
),
)

Update the deployment configuration by providing something like the following to pb.deployments.update. This configuration specifies an additional preloaded adapter (note that we must include the existing adapter in the new list). It also changes autoscaling behavior: the deployment can now scale down to no replicas when there is no traffic, and we scale down the number of replicas after 1200 seconds of low traffic (vs 600 seconds previously).

pb.deployments.update(
deployment_ref="my-mistral-7b",
config=UpdateDeploymentConfig(
cooldown_time=1200, # Changed from 600
custom_args=["--preloaded-adapter-ids", "my_adapter/1", "--preloaded-adapter-ids", "my_other_adapter/1"], # Added a second adapter
min_replicas=0, # Changed from 1
)
)

Now pb.deployments.get("my-mistral-7b") will return:

Deployment(
name="my-mistral-7b",
# <...>
config=UpdateDeploymentConfig(
custom_args=[], cooldown_time=1200, hf_token=None, min_replicas=0, max_replicas=2, scale_up_threshold=1
),
)

Note that max_replicase remains at the non-default value of 2, even though it was not explicitly set in the call to pb.deployments.update.

NOTES
  • Updating a deployment will not cause any downtime. The existing deployment will continue to serve requests while the new configuration is applied.
  • A very large number of (advanced) configuration parameters are configured by the custom_args field. See the custom_args section of DeploymentConfig for more information.
  • To update Lorax to the latest supported version, specify lorax_image_tag="<current>" in the config parameter.
  • Not all lorax CLI arguments are supported. Passing a non-supported argument will result in an error.
  • The SDK and backend do not validate that the values of custom_args are valid lorax parameters. Passing an invalid value will result in Lorax failing to start the deployment. (However the existing deployment will continue to serve.) custom_args is intended as a break-glass feature for advanced users who need to pass additional parameters to Lorax.
  • If you provide custom_args in the config parameter, it will replace the existing custom_args list. If you want to add to the existing list, you must include all the existing values in the new list.