Shared endpoints provide instant access to popular models for development and testing purposes. They’re designed for:

  • Quick experimentation and prototyping
  • Development and testing environments
  • Learning and evaluation of models
  • Proof of concept development

Note: Shared endpoints are not intended for production use. For production workloads, we strongly recommend using private deployments.

Quick Start

First, install the Predibase Python SDK:

pip install -U predibase

Then you can start experimenting with shared endpoints using just a few lines of code:

from predibase import Predibase

pb = Predibase(api_token="<PREDIBASE_API_TOKEN>")

# Use a shared endpoint for testing
client = pb.deployments.client("qwen3-8b")

# Generate text
response = client.generate("What is machine learning?")
print(response.generated_text)

Available Models

Predibase offers several popular models as shared endpoints for testing and development. See our supported models for the complete list.

Using Shared Endpoints

With Python SDK

Here’s a detailed example showing both basic text generation and streaming responses for development:

from predibase import Predibase

pb = Predibase(api_token="<PREDIBASE_API_TOKEN>")
client = pb.deployments.client("qwen3-8b")

# Basic generation with customizable parameters
response = client.generate(
    "Explain quantum computing in simple terms.",
    max_new_tokens=100
)
print(response.generated_text)

# Stream responses for real-time output
for response in client.generate_stream(
    "Write a story about a robot learning to paint.",
    max_new_tokens=200
):
    print(response.token.text, end="", flush=True)

With REST API

For testing language-agnostic integration, you can use our REST API:

# Set your credentials
export PREDIBASE_API_TOKEN="<YOUR TOKEN>"
export PREDIBASE_TENANT_ID="<YOUR TENANT ID>"
export PREDIBASE_MODEL="qwen3-8b"

# Generate text
curl -d '{
    "inputs": "Explain quantum computing in simple terms.",
    "parameters": {
        "max_new_tokens": 100
    }
}' \
    -H "Content-Type: application/json" \
    -X POST https://serving.app.predibase.com/$PREDIBASE_TENANT_ID/deployments/v2/llms/$PREDIBASE_MODEL/generate \
    -H "Authorization: Bearer $PREDIBASE_API_TOKEN"

Testing Custom Models

You can test your fine-tuned adapters on shared endpoints during development:

# Test your custom adapter with a shared endpoint
response = client.generate(
    "Summarize this article.",
    adapter_id="my-summarizer/1",  # Your adapter ID
    max_new_tokens=100
)

For more information about using fine-tuned models, see:

Rate Limits

Shared endpoints are subject to rate limits. Rate limits are restrictions that our API enforces on how often users can access our services within a given time period and can be identified via HTTP 429 error codes.

Rate Limits by Tier

TierRate LimitDailyMonthly
Free1 request / sec1 million tokens / day10 million tokens / day
Developer & Enterprise100 requests / sec1 million tokens / day10 million tokens / day
VPCDoes not applyDoes not applyDoes not apply

Rate Limit Headers

When making API requests, you’ll receive the following headers that help you monitor your rate limit status:

HeaderDescription
x-envoy-ratelimitedWhether the rate limit has been reached
x-ratelimit-limitThe max number of requests until the rate limit is reached
x-ratelimit-remainingThe remaining number of requests until the rate limit is reached
x-ratelimit-resetAmount of time (seconds) until you can query again

For production use cases, you should use private deployments which do not have any rate limits.

Moving to Production

When you’re ready to move your application to production:

  1. Set up a private deployment for production-grade reliability
  2. Configure auto-scaling to handle your workload
  3. Take advantage of dedicated resources and SLAs

Next Steps