Predibase offers two deployment categories:
- Private Deployments - Dedicated resources
with guaranteed availability, recommended for production use
- Shared Endpoints - Pre-deployed models for
quick experimentation and development
You can query these deployments using either the standard Predibase method or
the OpenAI-compatible method.
Predibase Method
Python SDK
First, install the Predibase Python SDK:
Then initialize the client and start generating:
from predibase import Predibase
# Initialize client
pb = Predibase(api_token="<PREDIBASE_API_TOKEN>")
# Get deployment client
client = pb.deployments.client("my-deployment") # or use a shared endpoint like "qwen3-8b"
# Generate text
response = client.generate(
"What is machine learning?",
max_new_tokens=100,
temperature=0.7
)
print(response.generated_text)
# Stream responses
for response in client.generate_stream(
"Write a story about a robot learning to paint.",
max_new_tokens=200
):
print(response.token.text, end="", flush=True)
REST API
You can also use our REST API directly:
# Set your credentials
export PREDIBASE_API_TOKEN="<YOUR TOKEN>"
export PREDIBASE_TENANT_ID="<YOUR TENANT ID>"
export PREDIBASE_MODEL="my-deployment" # or a shared endpoint
# Generate text
curl -X POST https://serving.app.predibase.com/$PREDIBASE_TENANT_ID/deployments/v2/llms/$PREDIBASE_MODEL/generate \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $PREDIBASE_API_TOKEN" \
-d '{
"inputs": "What is machine learning?",
"parameters": {
"max_new_tokens": 100,
"temperature": 0.7
}
}'
OpenAI-Compatible Method
Predibase supports
OpenAI Chat Completions v1
compatible endpoints that make it easy to migrate from OpenAI to Predibase.
Python SDK
Use the OpenAI Python SDK with Predibase’s endpoints:
from openai import OpenAI
# Initialize client
api_token = "<PREDIBASE_API_TOKEN>"
tenant_id = "<PREDIBASE_TENANT_ID>"
model_name = "<DEPLOYMENT_NAME>" # Ex. "qwen3-8b"
adapter = "<ADAPTER_REPO_NAME>/<VERSION_NUMBER>" # Ex. "adapter-repo/1" (optional)
base_url = f"https://serving.app.predibase.com/{tenant_id}/deployments/v2/llms/{model_name}/v1"
client = OpenAI(
api_key=api_token,
base_url=base_url
)
# Chat completion
completion = client.chat.completions.create(
model=adapter, # Use empty string "" for base model
messages=[
{"role": "user", "content": "What is machine learning?"}
],
max_tokens=100
)
print(completion.choices[0].message.content)
# Stream responses
completion_stream = client.chat.completions.create(
model=adapter, # Use empty string "" for base model
messages=[
{"role": "user", "content": "Write a story about a robot learning to paint."}
],
stream=True
)
response = []
for message in completion_stream:
token = message.choices[0].delta.content
response.append(token)
print(token, end='')
REST API
Use the OpenAI-compatible REST API:
# Set your credentials
export PREDIBASE_API_TOKEN="<YOUR TOKEN>"
export PREDIBASE_ENDPOINT="https://serving.app.predibase.com/<TENANT_ID>/deployments/v2/llms/<MODEL_NAME>"
# Chat completion
curl -i $PREDIBASE_ENDPOINT/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $PREDIBASE_API_TOKEN" \
-d '{
"model": "",
"messages": [
{"role": "user", "content": "What is machine learning?"}
],
"max_tokens": 100
}'
Function Calling
Function calling allows models to interact with external tools and APIs in a
structured way. To do function calling with Predibase deployments and/or adapters, define your functions and
include them in your requests with the
OpenAI Chat Completions v1 SDK method:
from openai import OpenAI
# Initialize client
api_token = "<PREDIBASE_API_TOKEN>"
tenant_id = "<PREDIBASE_TENANT_ID>"
model_name = "<DEPLOYMENT_NAME>" # Ex. "qwen3-8b"
adapter = "<ADAPTER_REPO_NAME>/<VERSION_NUMBER>" # Ex. "adapter-repo/1" (optional)
base_url = f"https://serving.app.predibase.com/{tenant_id}/deployments/v2/llms/{model_name}/v1"
client = OpenAI(
api_key=api_token,
base_url=base_url
)
# Define functions
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
}
}
}
}
]
# Make request with functions
completion = client.chat.completions.create(
model=adapter,
messages=[
{"role": "user", "content": "What's the weather like in San Francisco?"}
],
max_tokens=100,
tools=tools
)
print("Completion result:", completion.choices[0].message.content)
Structured Output
Predibase endpoints allow you to enforce that responses contain only valid JSON
and adhere to a provided schema.
The schema can be provided either using JSON Schema
(REST, Python) or Pydantic (Python).
Using Pydantic (Python SDK)
from pydantic import BaseModel, constr
from predibase import Predibase
import json
# Initialize Predibase client
pb = Predibase(api_token="<PREDIBASE_API_TOKEN>")
# Define a schema for the response
class Character(BaseModel):
name: constr(max_length=10)
age: int
strength: int
# Get a handle to the base LLM deployment
client = pb.deployments.client("qwen3-8b")
# Generate a response that adheres to the schema
response = client.generate(
"Generate a new character for my awesome game. Strength 1-10.",
response_format={
"type": "json_object",
"schema": Character.model_json_schema(),
},
max_new_tokens=128,
)
# Load the response as JSON and init an object of the desired schema
response_json = json.loads(response.generated_text)
my_character = Character(**response_json)
Using JSON Schema (REST API)
# Export environment variables
export PREDIBASE_API_TOKEN="<YOUR_TOKEN_HERE>" # Settings > My Profile > Generate API Token
export PREDIBASE_TENANT_ID="<YOUR_TENANT_ID>" # Settings > My Profile > Overview > Tenant ID
export PREDIBASE_DEPLOYMENT="qwen3-8b"
# Query the LLM deployment
curl https://serving.app.predibase.com/$PREDIBASE_TENANT_ID/deployments/v2/llms/$PREDIBASE_DEPLOYMENT/generate \
-H "Content-Type: application/json" \
-X POST \
-d '{
"inputs": "Generate a new character for my awesome game. Strength 1-10.",
"parameters": {
"response_format": {
"type": "json_object",
"schema": {
"properties": {
"name": {"maxLength": 10, "title": "Name", "type": "string"},
"age": {"title": "Age", "type": "integer"},
"strength": {"title": "Strength", "type": "integer"}
},
"required": ["name", "age", "strength"],
"title": "Character",
"type": "object"
}
},
"max_new_tokens": 128
}
}' \
-H "Authorization: Bearer ${PREDIBASE_API_TOKEN}"
Complex Schemas
You can define more complex schemas with nested objects and arrays:
from pydantic import BaseModel, Field
from typing import List
class Skill(BaseModel):
name: str
level: int = Field(ge=1, le=100)
description: str
class Character(BaseModel):
name: str
age: int
skills: List[Skill]
inventory: List[str]
stats: dict[str, float]
response = client.generate(
"Create a detailed RPG character with skills and inventory",
response_format={
"type": "json_object",
"schema": Character.model_json_schema(),
}
)