Predibase supports using fine-tuned adapters to customize model behavior. You can:

  • Upload local PEFT adapters
  • Use adapters from Hugging Face Hub
  • Deploy adapters with any compatible base model

Quick Start

First, install the Predibase Python SDK:

pip install -U predibase

Adapter Sources and IDs

Adapters can come from three sources:

1. Predibase Adapters

You can fine-tune adapters on Predibase using the SDK or UI. Once trained, you can prompt them using a deployment of the base model which was fine-tuned, without needing to pre-load or recreate the deployment. All deployments support adapter inference out of the box.

from predibase import Predibase

# Specify the deployment of the base model
client = pb.deployments.client("my-qwen3-8b")

# Specify your adapter_id as "adapter-repo-name/adapter-version-number"
print(client.generate("hello", adapter_id="repo-name/1", max_new_tokens=100).generated_text)

# Using a specific adapter checkpoint
print(client.generate("hello", adapter_id="repo-name/1@7", max_new_tokens=100).generated_text)

2. Upload Local Adapters

Import an adapter trained outside of Predibase for inference. Your local adapter directory must follow the PEFT format:

/path/to/adapter/
    ├── adapter_config.json                             # Configuration file
    └── adapter_model.safetensors or adapter_model.bin  # Model weights

Upload and prompt the adapter.

from predibase import Predibase

pb = Predibase(api_token="<PREDIBASE_API_TOKEN>")

# Upload adapter weights
adapter = pb.adapters.upload(
    local_dir="./my_adapter_weights",
    repo="repo-name",
)

# Specify the deployment of the base model
client = pb.deployments.client("my-qwen3-8b")

# Specify your adapter_id as "adapter-repo-name/adapter-version-number"
print(client.generate("hello", adapter_id="repo-name/1", max_new_tokens=100).generated_text)

3. Hugging Face Hub Adapters

When using adapters from Hugging Face:

  • Adapter ID format: "organization/adapter-name"
  • Example: "predibase/tldr_headline_gen", "predibase/mistral-instruct"
  • Must specify adapter_source="hub"

Public Adapters

# Use a public adapter from Hugging Face
client = pb.deployments.client("my-qwen3-8b")
response = client.generate(
    "The following passage is content from a news report. Please summarize this passage in one sentence or less. Passage: Jeffrey Berns, CEO of Blockchains LLC, wants the Nevada government to allow companies like his to form local governments on land they own, granting them power over everything from schools to law enforcement. Berns envisions a city based on digital currencies and blockchain storage. His company is proposing to build a 15,000 home town 12 miles east of Reno. Nevada Lawmakers have responded with intrigue and skepticism. The proposed legislation has yet to be formally filed or discussed in public hearings. Summary: ",
    adapter_id="<org>/<public-adapter>",  # Hugging Face public adapter path
    adapter_source="hub",                     # Specify Hub as source
    max_new_tokens=256
)

Private Adapters

To run inference on your private adapter, you’ll additionally need:

  • HuggingFace API token with write access
# Use a private adapter from Hugging Face
client = pb.deployments.client("my-qwen3-8b")
response = client.generate(
    "The following passage is content from a news report. Please summarize this passage in one sentence or less. Passage: Jeffrey Berns, CEO of Blockchains LLC, wants the Nevada government to allow companies like his to form local governments on land they own, granting them power over everything from schools to law enforcement. Berns envisions a city based on digital currencies and blockchain storage. His company is proposing to build a 15,000 home town 12 miles east of Reno. Nevada Lawmakers have responded with intrigue and skepticism. The proposed legislation has yet to be formally filed or discussed in public hearings. Summary: ",
    adapter_id="predibase/tldr_headline_gen",
    adapter_source="hub",
    api_token="<HUGGINGFACE_API_TOKEN>",  # Required for private adapters
    max_new_tokens=256
)

With REST API

Access adapters through the REST API for language-agnostic integration. First, set up your environment variables:

# Set your credentials
    export PREDIBASE_API_TOKEN="<YOUR TOKEN HERE>"
    export PREDIBASE_TENANT_ID="<YOUR TENANT ID>"

For PREDIBASE_DEPLOYMENT, the base model must correspond to the model that was fine-tuned:

  • For shared LLMs, use the model name (e.g., “qwen3-8b”)
  • For private serverless deployments, use your deployment name (e.g., “my-qwen3-8b”)
# Set the deployment name
export PREDIBASE_DEPLOYMENT="<DEPLOYMENT NAME>"

# Using a local adapter
curl -d '{
    "inputs": "What is machine learning?",
    "parameters": {
        "adapter_source": "pbase",
        "adapter_id": "<repository_name>/<version>",
        "max_new_tokens": 128
    }
}' \
    -H "Content-Type: application/json" \
    -X POST https://serving.app.predibase.com/$PREDIBASE_TENANT_ID/deployments/v2/llms/$PREDIBASE_DEPLOYMENT/generate \
    -H "Authorization: Bearer ${PREDIBASE_API_TOKEN}"

# Using a Hugging Face adapter
curl -d '{
    "inputs": "What is your name?",
    "parameters": {
        "adapter_source": "hub",
        "adapter_id": "<org>/<adapter-name>",
        "api_token": "<HUGGINGFACE_API_TOKEN>",  # Required for private adapters
        "max_new_tokens": 128
    }
}' \
    -H "Content-Type: application/json" \
    -X POST https://serving.app.predibase.com/$PREDIBASE_TENANT_ID/deployments/v2/llms/$PREDIBASE_DEPLOYMENT/generate \
    -H "Authorization: Bearer ${PREDIBASE_API_TOKEN}"

Important Notes

  • When querying fine-tuned models, include the prompt template used for fine-tuning in the inputs
  • For streaming responses, use the /generate_stream endpoint instead of /generate
  • Parameters follow the same format as the LoRAX generate endpoint

Next Steps