Predibase supports using fine-tuned adapters to customize model behavior. You
can:
- Upload local PEFT adapters
- Use adapters from Hugging Face Hub
- Deploy adapters with any compatible base model
Quick Start
First, install the Predibase Python SDK:
Adapter Sources and IDs
Adapters can come from three sources:
1. Predibase Adapters
You can fine-tune adapters on Predibase using the SDK or UI. Once trained, you can
prompt them using a deployment of the base model which was fine-tuned, without needing to pre-load or recreate the deployment.
All deployments support adapter inference out of the box.
from predibase import Predibase
# Specify the deployment of the base model
client = pb.deployments.client("my-qwen3-8b")
# Specify your adapter_id as "adapter-repo-name/adapter-version-number"
print(client.generate("hello", adapter_id="repo-name/1", max_new_tokens=100).generated_text)
# Using a specific adapter checkpoint
print(client.generate("hello", adapter_id="repo-name/1@7", max_new_tokens=100).generated_text)
2. Upload Local Adapters
Import an adapter trained outside of Predibase for inference. Your local adapter directory must follow the
PEFT format:
/path/to/adapter/
├── adapter_config.json # Configuration file
└── adapter_model.safetensors or adapter_model.bin # Model weights
Upload and prompt the adapter.
from predibase import Predibase
pb = Predibase(api_token="<PREDIBASE_API_TOKEN>")
# Upload adapter weights
adapter = pb.adapters.upload(
local_dir="./my_adapter_weights",
repo="repo-name",
)
# Specify the deployment of the base model
client = pb.deployments.client("my-qwen3-8b")
# Specify your adapter_id as "adapter-repo-name/adapter-version-number"
print(client.generate("hello", adapter_id="repo-name/1", max_new_tokens=100).generated_text)
3. Hugging Face Hub Adapters
When using adapters from Hugging Face:
- Adapter ID format:
"organization/adapter-name"
- Example:
"predibase/tldr_headline_gen"
, "predibase/mistral-instruct"
- Must specify
adapter_source="hub"
Public Adapters
# Use a public adapter from Hugging Face
client = pb.deployments.client("my-qwen3-8b")
response = client.generate(
"The following passage is content from a news report. Please summarize this passage in one sentence or less. Passage: Jeffrey Berns, CEO of Blockchains LLC, wants the Nevada government to allow companies like his to form local governments on land they own, granting them power over everything from schools to law enforcement. Berns envisions a city based on digital currencies and blockchain storage. His company is proposing to build a 15,000 home town 12 miles east of Reno. Nevada Lawmakers have responded with intrigue and skepticism. The proposed legislation has yet to be formally filed or discussed in public hearings. Summary: ",
adapter_id="<org>/<public-adapter>", # Hugging Face public adapter path
adapter_source="hub", # Specify Hub as source
max_new_tokens=256
)
Private Adapters
To run inference on your private adapter, you’ll additionally need:
- HuggingFace API token with write access
# Use a private adapter from Hugging Face
client = pb.deployments.client("my-qwen3-8b")
response = client.generate(
"The following passage is content from a news report. Please summarize this passage in one sentence or less. Passage: Jeffrey Berns, CEO of Blockchains LLC, wants the Nevada government to allow companies like his to form local governments on land they own, granting them power over everything from schools to law enforcement. Berns envisions a city based on digital currencies and blockchain storage. His company is proposing to build a 15,000 home town 12 miles east of Reno. Nevada Lawmakers have responded with intrigue and skepticism. The proposed legislation has yet to be formally filed or discussed in public hearings. Summary: ",
adapter_id="predibase/tldr_headline_gen",
adapter_source="hub",
api_token="<HUGGINGFACE_API_TOKEN>", # Required for private adapters
max_new_tokens=256
)
With REST API
Access adapters through the REST API for language-agnostic integration. First,
set up your environment variables:
# Set your credentials
export PREDIBASE_API_TOKEN="<YOUR TOKEN HERE>"
export PREDIBASE_TENANT_ID="<YOUR TENANT ID>"
For PREDIBASE_DEPLOYMENT
, the base model must correspond to the model that was
fine-tuned:
- For shared LLMs, use the model name (e.g., “qwen3-8b”)
- For private serverless deployments, use your deployment name (e.g.,
“my-qwen3-8b”)
# Set the deployment name
export PREDIBASE_DEPLOYMENT="<DEPLOYMENT NAME>"
# Using a local adapter
curl -d '{
"inputs": "What is machine learning?",
"parameters": {
"adapter_source": "pbase",
"adapter_id": "<repository_name>/<version>",
"max_new_tokens": 128
}
}' \
-H "Content-Type: application/json" \
-X POST https://serving.app.predibase.com/$PREDIBASE_TENANT_ID/deployments/v2/llms/$PREDIBASE_DEPLOYMENT/generate \
-H "Authorization: Bearer ${PREDIBASE_API_TOKEN}"
# Using a Hugging Face adapter
curl -d '{
"inputs": "What is your name?",
"parameters": {
"adapter_source": "hub",
"adapter_id": "<org>/<adapter-name>",
"api_token": "<HUGGINGFACE_API_TOKEN>", # Required for private adapters
"max_new_tokens": 128
}
}' \
-H "Content-Type: application/json" \
-X POST https://serving.app.predibase.com/$PREDIBASE_TENANT_ID/deployments/v2/llms/$PREDIBASE_DEPLOYMENT/generate \
-H "Authorization: Bearer ${PREDIBASE_API_TOKEN}"
Important Notes
- When querying fine-tuned models, include the prompt template used for
fine-tuning in the
inputs
- For streaming responses, use the
/generate_stream
endpoint instead of
/generate
- Parameters follow the same format as the LoRAX generate endpoint
Next Steps