Shared Deployments

Shared endpoints provide instant access to popular models for development and testing purposes. They’re designed for:

Quick experimentation and prototyping
Development and testing environments
Learning and evaluation of models
Proof of concept development

Note: Shared endpoints are not intended for production use. For production workloads, we strongly recommend using private deployments.

Quick Start

First, install the Predibase Python SDK:

pip install -U predibase

Then you can start experimenting with shared endpoints using just a few lines of code:

from predibase import Predibase

pb = Predibase(api_token="<PREDIBASE_API_TOKEN>")

# Use a shared endpoint for testing
client = pb.deployments.client("qwen3-8b")

# Generate text
response = client.generate("What is machine learning?")
print(response.generated_text)

Available Models

Predibase offers several popular models as shared endpoints for testing and development. See our supported models for the complete list.

Using Shared Endpoints

With Python SDK

Here’s a detailed example showing both basic text generation and streaming responses for development:

from predibase import Predibase

pb = Predibase(api_token="<PREDIBASE_API_TOKEN>")
client = pb.deployments.client("qwen3-8b")

# Basic generation with customizable parameters
response = client.generate(
    "Explain quantum computing in simple terms.",
    max_new_tokens=100
)
print(response.generated_text)

# Stream responses for real-time output
for response in client.generate_stream(
    "Write a story about a robot learning to paint.",
    max_new_tokens=200
):
    print(response.token.text, end="", flush=True)

With REST API

For testing language-agnostic integration, you can use our REST API:

# Set your credentials
export PREDIBASE_API_TOKEN="<YOUR TOKEN>"
export PREDIBASE_TENANT_ID="<YOUR TENANT ID>"
export PREDIBASE_MODEL="qwen3-8b"

# Generate text
curl -d '{
    "inputs": "Explain quantum computing in simple terms.",
    "parameters": {
        "max_new_tokens": 100
    }
}' \
    -H "Content-Type: application/json" \
    -X POST https://serving.app.predibase.com/$PREDIBASE_TENANT_ID/deployments/v2/llms/$PREDIBASE_MODEL/generate \
    -H "Authorization: Bearer $PREDIBASE_API_TOKEN"

Testing Custom Models

You can test your fine-tuned adapters on shared endpoints during development:

# Test your custom adapter with a shared endpoint
response = client.generate(
    "Summarize this article.",
    adapter_id="my-summarizer/1",  # Your adapter ID
    max_new_tokens=100
)

For more information about using fine-tuned models, see:

Rate Limits

Shared endpoints are subject to rate limits. Rate limits are restrictions that our API enforces on how often users can access our services within a given time period and can be identified via HTTP 429 error codes.

Rate Limits by Tier

Tier	Rate Limit	Daily	Monthly
Free	1 request / sec	1 million tokens / day	10 million tokens / day
Enterprise SaaS	100 requests / sec	1 million tokens / day	10 million tokens / day
Enterprise VPC	Does not apply	Does not apply	Does not apply

Rate Limit Headers

When making API requests, you’ll receive the following headers that help you monitor your rate limit status:

Header	Description
`x-envoy-ratelimited`	Whether the rate limit has been reached
`x-ratelimit-limit`	The max number of requests until the rate limit is reached
`x-ratelimit-remaining`	The remaining number of requests until the rate limit is reached
`x-ratelimit-reset`	Amount of time (seconds) until you can query again

For production use cases, you should use private deployments which do not have any rate limits.

Moving to Production

When you’re ready to move your application to production:

Set up a private deployment for production-grade reliability
Configure auto-scaling to handle your workload
Take advantage of dedicated resources and SLAs

Next Steps

Upgrade to private deployment for production use
Explore available models
Use fine-tuned models
Learn about structured output

Getting Started

Inference

Fine-Tuning

Account

Integrations

Examples

Resources

Shared Deployments

Quick Start

Available Models

Using Shared Endpoints

With Python SDK

With REST API

Testing Custom Models

Rate Limits

Rate Limits by Tier

Rate Limit Headers

Moving to Production

Next Steps

Getting Started

Inference

Fine-Tuning

Account

Integrations

Examples

Resources

​Quick Start

​Available Models

​Using Shared Endpoints

​With Python SDK

​With REST API

​Testing Custom Models

​Rate Limits

​Rate Limits by Tier

​Rate Limit Headers

​Moving to Production

​Next Steps

Quick Start

Available Models

Using Shared Endpoints

With Python SDK

With REST API

Testing Custom Models

Rate Limits

Rate Limits by Tier

Rate Limit Headers

Moving to Production

Next Steps