Predibase offers two deployment categories:

  • Private Deployments - Dedicated resources with guaranteed availability, recommended for production use
  • Shared Endpoints - Pre-deployed models for quick experimentation and development

You can query these deployments using either the standard Predibase method or the OpenAI-compatible method.

Predibase Method

Python SDK

First, install the Predibase Python SDK:

pip install -U predibase

Then initialize the client and start generating:

from predibase import Predibase

# Initialize client
pb = Predibase(api_token="<PREDIBASE_API_TOKEN>")

# Get deployment client
client = pb.deployments.client("my-deployment")  # or use a shared endpoint like "qwen3-8b"

# Generate text
response = client.generate(
    "What is machine learning?",
    max_new_tokens=100,
    temperature=0.7
)
print(response.generated_text)

# Stream responses
for response in client.generate_stream(
    "Write a story about a robot learning to paint.",
    max_new_tokens=200
):
    print(response.token.text, end="", flush=True)

REST API

You can also use our REST API directly:

# Set your credentials
export PREDIBASE_API_TOKEN="<YOUR TOKEN>"
export PREDIBASE_TENANT_ID="<YOUR TENANT ID>"
export PREDIBASE_MODEL="my-deployment"  # or a shared endpoint

# Generate text
curl -X POST https://serving.app.predibase.com/$PREDIBASE_TENANT_ID/deployments/v2/llms/$PREDIBASE_MODEL/generate \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $PREDIBASE_API_TOKEN" \
  -d '{
    "inputs": "What is machine learning?",
    "parameters": {
        "max_new_tokens": 100,
        "temperature": 0.7
    }
  }'

OpenAI-Compatible Method

Predibase supports OpenAI Chat Completions v1 compatible endpoints that make it easy to migrate from OpenAI to Predibase.

Python SDK

Use the OpenAI Python SDK with Predibase’s endpoints:

from openai import OpenAI

# Initialize client
api_token = "<PREDIBASE_API_TOKEN>"
tenant_id = "<PREDIBASE_TENANT_ID>"
model_name = "<DEPLOYMENT_NAME>"  # Ex. "qwen3-8b"
adapter = "<ADAPTER_REPO_NAME>/<VERSION_NUMBER>"  # Ex. "adapter-repo/1" (optional)
base_url = f"https://serving.app.predibase.com/{tenant_id}/deployments/v2/llms/{model_name}/v1"

client = OpenAI(
    api_key=api_token,
    base_url=base_url
)

# Chat completion
completion = client.chat.completions.create(
    model=adapter,  # Use empty string "" for base model
    messages=[
        {"role": "user", "content": "What is machine learning?"}
    ],
    max_tokens=100
)
print(completion.choices[0].message.content)

# Stream responses
completion_stream = client.chat.completions.create(
    model=adapter,  # Use empty string "" for base model
    messages=[
        {"role": "user", "content": "Write a story about a robot learning to paint."}
    ],
    stream=True
)
response = []
for message in completion_stream:
    token = message.choices[0].delta.content
    response.append(token)
    print(token, end='')

REST API

Use the OpenAI-compatible REST API:

# Set your credentials
export PREDIBASE_API_TOKEN="<YOUR TOKEN>"
export PREDIBASE_ENDPOINT="https://serving.app.predibase.com/<TENANT_ID>/deployments/v2/llms/<MODEL_NAME>"

# Chat completion
curl -i $PREDIBASE_ENDPOINT/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $PREDIBASE_API_TOKEN" \
  -d '{
    "model": "",
    "messages": [
      {"role": "user", "content": "What is machine learning?"}
    ],
    "max_tokens": 100
  }'

Function Calling

Function calling allows models to interact with external tools and APIs in a structured way. To do function calling with Predibase deployments and/or adapters, define your functions and include them in your requests with the OpenAI Chat Completions v1 SDK method:

from openai import OpenAI

# Initialize client
api_token = "<PREDIBASE_API_TOKEN>"
tenant_id = "<PREDIBASE_TENANT_ID>"
model_name = "<DEPLOYMENT_NAME>"  # Ex. "qwen3-8b"
adapter = "<ADAPTER_REPO_NAME>/<VERSION_NUMBER>"  # Ex. "adapter-repo/1" (optional)
base_url = f"https://serving.app.predibase.com/{tenant_id}/deployments/v2/llms/{model_name}/v1"

client = OpenAI(
    api_key=api_token,
    base_url=base_url
)

# Define functions
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"}
                }
            }
        }
    }
]

# Make request with functions
completion = client.chat.completions.create(
    model=adapter,
    messages=[
        {"role": "user", "content": "What's the weather like in San Francisco?"}
    ],
    max_tokens=100,
    tools=tools
)

print("Completion result:", completion.choices[0].message.content)

Structured Output

Predibase endpoints allow you to enforce that responses contain only valid JSON and adhere to a provided schema.

The schema can be provided either using JSON Schema (REST, Python) or Pydantic (Python).

Using Pydantic (Python SDK)

from pydantic import BaseModel, constr
from predibase import Predibase
import json

# Initialize Predibase client
pb = Predibase(api_token="<PREDIBASE_API_TOKEN>")

# Define a schema for the response
class Character(BaseModel):
    name: constr(max_length=10)
    age: int
    strength: int

# Get a handle to the base LLM deployment
client = pb.deployments.client("qwen3-8b")

# Generate a response that adheres to the schema
response = client.generate(
    "Generate a new character for my awesome game. Strength 1-10.",
    response_format={
        "type": "json_object",
        "schema": Character.model_json_schema(),
    },
    max_new_tokens=128,
)

# Load the response as JSON and init an object of the desired schema
response_json = json.loads(response.generated_text)
my_character = Character(**response_json)

Using JSON Schema (REST API)

# Export environment variables
export PREDIBASE_API_TOKEN="<YOUR_TOKEN_HERE>" # Settings > My Profile > Generate API Token
export PREDIBASE_TENANT_ID="<YOUR_TENANT_ID>" # Settings > My Profile > Overview > Tenant ID
export PREDIBASE_DEPLOYMENT="qwen3-8b"

# Query the LLM deployment
curl https://serving.app.predibase.com/$PREDIBASE_TENANT_ID/deployments/v2/llms/$PREDIBASE_DEPLOYMENT/generate \
    -H "Content-Type: application/json" \
    -X POST \
    -d '{
        "inputs": "Generate a new character for my awesome game. Strength 1-10.",
        "parameters": {
            "response_format": {
                "type": "json_object",
                "schema": {
                    "properties": {
                        "name": {"maxLength": 10, "title": "Name", "type": "string"},
                        "age": {"title": "Age", "type": "integer"},
                        "strength": {"title": "Strength", "type": "integer"}
                    },
                    "required": ["name", "age", "strength"],
                    "title": "Character",
                    "type": "object"
                }
            },
            "max_new_tokens": 128
        }
    }' \
    -H "Authorization: Bearer ${PREDIBASE_API_TOKEN}"

Complex Schemas

You can define more complex schemas with nested objects and arrays:

from pydantic import BaseModel, Field
from typing import List

class Skill(BaseModel):
    name: str
    level: int = Field(ge=1, le=100)
    description: str

class Character(BaseModel):
    name: str
    age: int
    skills: List[Skill]
    inventory: List[str]
    stats: dict[str, float]

response = client.generate(
    "Create a detailed RPG character with skills and inventory",
    response_format={
        "type": "json_object",
        "schema": Character.model_json_schema(),
    }
)