The Deployments API provides methods for creating, retrieving, updating, and
managing model deployments.
Create Deployment
pb.deployments.create(
name: str, # Name for the deployment
config: DeploymentConfig, # Deployment configuration
description: Optional[str] = None # Description for the deployment (optional)
) -> Deployment
Create a private serverless deployment.
Parameters
- name: str - Name for the deployment
- config:
DeploymentConfig - Deployment configuration
- description: str, optional - Description for the deployment
Returns
- Deployment - The created deployment object
Example 1: Basic deployment with defaults
from predibase import DeploymentConfig
# Create a deployment with the default configurations
deployment = pb.deployments.create(
name="my-qwen3-8b",
config=DeploymentConfig(base_model="qwen3-8b")
)
Example 2: Advanced deployment
from predibase import DeploymentConfig
# Create a deployment with custom configuration
deployment = pb.deployments.create(
name="my-qwen3-8b",
config=DeploymentConfig(
base_model="qwen3-8b",
max_replicas=2,
min_replicas=1,
quantization="fp8", # Enable quantization
max_total_tokens=4096, # Change the default context window size
request_logging_enabled=True, # Enable request logging
preloaded_adapters=["my-adapter/1", "my-adapter/2"], # Preload adapters for performance
prefix_caching=True # Enable prefix caching
),
description="Production-ready Qwen 3 model with logging"
)
Get Deployment
pb.deployments.get(
deployment_ref: str # Name of the deployment
) -> Deployment
Fetch a deployment by name.
Parameters
- deployment_ref: str - Name of the deployment
Returns
- Deployment - The requested deployment object
Example
# Get a deployment by name
deployment = pb.deployments.get("my-qwen3-8b")
# Print deployment details
print(f"Deployment: {deployment.name}")
print(f"Status: {deployment.status}")
Update Deployment
pb.deployments.update(
deployment_ref: str, # Name of the deployment
config: UpdateDeploymentConfig, # Deployment configuration
) -> Deployment
Update an existing deployment’s configuration.
Parameters
Returns
- Deployment - The updated deployment object
Example
from predibase import UpdateDeploymentConfig
# Update scaling configuration
pb.deployments.update(
deployment_ref="my-qwen3-8b",
config=UpdateDeploymentConfig(
min_replicas=1, # Change to always-on (disable scale-to-zero)
max_replicas=2, # Allow scaling to 2 replicas for higher load
cooldown_time=1800 # Change cooldown time
)
)
Warning for deployments with 1 replica
To perform certain configuration updates, 1 replica of the deployment is spun down before any new replicas of the new deployment version are created. If your deployment only has 1 replica, or it is currently only running with 1 replica up, it will take this replica down causing downtime.
To prevent downtime, you can first update the deployment to have min_replicas=2
to ensure at least one replica is up to serve traffic during the update. Updating the replica counts of a deployment will not cause downtime.
To update Lorax to the latest supported version, specify lorax_image_tag="<current>"
in the config parameter.
List Deployments
pb.deployments.list(
type: Optional[str] = None, # Filter to show only "private" or "shared"
) -> list[Deployment]
List all deployments.
Parameters
- type: str, optional - Either
shared
or private
to return only deployments of the specified type
Returns
- list[Deployment] - List of deployment objects
Example
# List private deployments
deployments = pb.deployments.list(type="private")
# Print deployment details
for deployment in deployments:
print(f"Deployment: {deployment.name} - Status: {deployment.status}")
Delete Deployment
pb.deployments.delete(
deployment_ref: str # Name of the deployment
) -> None
Delete a deployment.
Parameters
- deployment_ref: str - Name of the deployment
Returns
Example
# Delete a deployment
pb.deployments.delete("my-qwen3-8b")
Download Request Logs
pb.deployments.download_request_logs(
deployment_ref: str, # Name of the deployment
dest: Optional[str] = None, # Local destination to download logs to
adapter_id: Optional[str] = None, # Retrieve logs for a particular adapter
from_: Optional[str] = None, # Start datetime, defaults to 1 day ago
to: Optional[str] = None, # End datetime, defaults to current date
)
Download request logs for a deployment.
Parameters
- deployment_ref: str - Name of the deployment
- dest: str, optional - Local destination to download logs to
- adapter_id: str, optional - Retrieve only logs pertaining to a specific adapter. If not set, only requests NOT specifying an adapter are retrieved.
- from_: str, optional - Starting date/time (ISO format)
- to: str, optional - Ending date/time (ISO format)
Returns
- str - Path to the downloaded logs
Example 1
# Download one day's worth of requests using the base model (no adapter)
pb.deployments.download_request_logs(
deployment_ref="my-qwen3-8b",
dest="/logs/",
from_="2025-05-20",
to="2025-05-21"
)
Example 2
# Download 7 days' worth of requests using adapter mymodel/1
pb.deployments.download_request_logs(
deployment_ref="my-qwen3-8b",
adapter_id="mymodel/1",
from_="2025-05-20",
to="2025-05-21"
)