The Deployments API provides methods for creating, retrieving, updating, and managing model deployments.

Create Deployment

pb.deployments.create(
    name: str,                              # Name for the deployment
    config: DeploymentConfig,               # Deployment configuration
    description: Optional[str] = None       # Description for the deployment (optional)
) -> Deployment

Create a private serverless deployment.

Parameters

  • name: str - Name for the deployment
  • config: DeploymentConfig - Deployment configuration
  • description: str, optional - Description for the deployment

Returns

  • Deployment - The created deployment object

Example 1: Basic deployment with defaults

from predibase import DeploymentConfig

# Create a deployment with the default configurations
deployment = pb.deployments.create(
    name="my-qwen3-8b",
    config=DeploymentConfig(base_model="qwen3-8b")
)

Example 2: Advanced deployment

from predibase import DeploymentConfig

# Create a deployment with custom configuration
deployment = pb.deployments.create(
    name="my-qwen3-8b",
    config=DeploymentConfig(
        base_model="qwen3-8b",
        max_replicas=2,
        min_replicas=1,
        quantization="fp8", # Enable quantization
        max_total_tokens=4096, # Change the default context window size
        request_logging_enabled=True, # Enable request logging
        preloaded_adapters=["my-adapter/1", "my-adapter/2"], # Preload adapters for performance
        prefix_caching=True # Enable prefix caching
    ),
    description="Production-ready Qwen 3 model with logging"
)

Get Deployment

pb.deployments.get(
    deployment_ref: str # Name of the deployment
) -> Deployment

Fetch a deployment by name.

Parameters

  • deployment_ref: str - Name of the deployment

Returns

  • Deployment - The requested deployment object

Example

# Get a deployment by name
deployment = pb.deployments.get("my-qwen3-8b")

# Print deployment details
print(f"Deployment: {deployment.name}")
print(f"Status: {deployment.status}")

Update Deployment

pb.deployments.update(
    deployment_ref: str,            # Name of the deployment
    config: UpdateDeploymentConfig, # Deployment configuration
) -> Deployment

Update an existing deployment’s configuration.

Parameters

Returns

  • Deployment - The updated deployment object

Example

from predibase import UpdateDeploymentConfig

# Update scaling configuration
pb.deployments.update(
    deployment_ref="my-qwen3-8b",
    config=UpdateDeploymentConfig(
        min_replicas=1,    # Change to always-on (disable scale-to-zero)
        max_replicas=2,    # Allow scaling to 2 replicas for higher load
        cooldown_time=1800 # Change cooldown time
    )
)

Warning for deployments with 1 replica

To perform certain configuration updates, 1 replica of the deployment is spun down before any new replicas of the new deployment version are created. If your deployment only has 1 replica, or it is currently only running with 1 replica up, it will take this replica down causing downtime.

To prevent downtime, you can first update the deployment to have min_replicas=2 to ensure at least one replica is up to serve traffic during the update. Updating the replica counts of a deployment will not cause downtime.

To update Lorax to the latest supported version, specify lorax_image_tag="<current>" in the config parameter.

List Deployments

pb.deployments.list(
    type: Optional[str] = None, # Filter to show only "private" or "shared"
) -> list[Deployment]

List all deployments.

Parameters

  • type: str, optional - Either shared or private to return only deployments of the specified type

Returns

  • list[Deployment] - List of deployment objects

Example

# List private deployments
deployments = pb.deployments.list(type="private")

# Print deployment details
for deployment in deployments:
    print(f"Deployment: {deployment.name} - Status: {deployment.status}")

Delete Deployment

pb.deployments.delete(
    deployment_ref: str     # Name of the deployment
) -> None

Delete a deployment.

Parameters

  • deployment_ref: str - Name of the deployment

Returns

  • None

Example

# Delete a deployment
pb.deployments.delete("my-qwen3-8b")

Download Request Logs

pb.deployments.download_request_logs(
    deployment_ref: str,     # Name of the deployment
    dest: Optional[str] = None, # Local destination to download logs to
    adapter_id: Optional[str] = None, # Retrieve logs for a particular adapter
    from_: Optional[str] = None, # Start datetime, defaults to 1 day ago
    to: Optional[str] = None, # End datetime, defaults to current date
)

Download request logs for a deployment.

Parameters

  • deployment_ref: str - Name of the deployment
  • dest: str, optional - Local destination to download logs to
  • adapter_id: str, optional - Retrieve only logs pertaining to a specific adapter. If not set, only requests NOT specifying an adapter are retrieved.
  • from_: str, optional - Starting date/time (ISO format)
  • to: str, optional - Ending date/time (ISO format)

Returns

  • str - Path to the downloaded logs

Example 1

# Download one day's worth of requests using the base model (no adapter)
pb.deployments.download_request_logs(
    deployment_ref="my-qwen3-8b",
    dest="/logs/",
    from_="2025-05-20",
    to="2025-05-21"
)

Example 2

# Download 7 days' worth of requests using adapter mymodel/1
pb.deployments.download_request_logs(
    deployment_ref="my-qwen3-8b",
    adapter_id="mymodel/1",
    from_="2025-05-20",
    to="2025-05-21"
)