Quickstart

Predibase provides the fastest way to fine-tune and serve open-source LLMs. It's built on top of open-source LoRAX.

Inference: Try the Python SDK / REST or the Web Playground to prompt serverless endpoints
Fine-Tuning: Fine-tune and serve a model in just a few steps using the SDK or UI

Run inference using the SDK or REST

Create an account here.
Navigate to the Settings page and click Generate API Token.
Install the Python SDK with pip install -U predibase
See available serverless deployments. (Note: VPC customers will need to first deploy a dedicated deployment.)

Python SDK
REST

from predibase import Predibase, FinetuningConfig, DeploymentConfig

pb = Predibase(api_token="<PREDIBASE API TOKEN>")

lorax_client = pb.deployments.client("mistral-7b-instruct-v0-2") # Insert deployment name here
resp = lorax_client.generate("[INST] What are some popular tourist spots in San Francisco? [/INST]")
print(resp.generated_text)

# Export environment variables
export PREDIBASE_API_TOKEN="<YOUR TOKEN HERE>" # Settings > My Profile > Generate API Token
export PREDIBASE_TENANT_ID="<YOUR TENANT ID>" # Settings > My Profile > Overview > Tenant ID
export PREDIBASE_DEPLOYMENT="mistral-7b-instruct-v0-2"

# query the LLM deployment
curl -d '{"inputs": "[INST] What are some popular tourist spots in San Francisco? [/INST]"}' \
    -H "Content-Type: application/json" \
    -X POST https://serving.app.predibase.com/$PREDIBASE_TENANT_ID/deployments/v2/llms/$PREDIBASE_DEPLOYMENT/generate \
    -H "Authorization: Bearer ${PREDIBASE_API_TOKEN}"

info

Note the explicit use of special tokens before and after the prompt. These are used with instruction- and chat-tuned models to improve response quality. See Instruction Templates for details on how these should be applied for each of the serverless model endpoints.

Streaming

Python SDK
REST

from predibase import Predibase, FinetuningConfig, DeploymentConfig

pb = Predibase(api_token="<PREDIBASE API TOKEN>")

for resp in lorax_client.generate_stream("[INST] What are some popular tourist spots in San Francisco? [/INST]"):
    if not resp.token.special:
        print(resp.token.text, sep="", end="", flush=True)

# Export environment variables
export PREDIBASE_API_TOKEN="<YOUR TOKEN HERE>" # Settings > My Profile > Generate API Token
export PREDIBASE_TENANT_ID="<YOUR TENANT ID>" # Settings > My Profile > Overview > Tenant ID
export PREDIBASE_DEPLOYMENT="mistral-7b-instruct-v0-2"

# query the LLM deployment
curl -d '{"inputs": "[INST] What are some popular tourist spots in San Francisco? [/INST]"}' \
    -H "Content-Type: application/json" \
    -X POST https://serving.app.predibase.com/$PREDIBASE_TENANT_ID/deployments/v2/llms/$PREDIBASE_DEPLOYMENT/generate_stream \
    -H "Authorization: Bearer ${PREDIBASE_API_TOKEN}"

Next steps

Try out the full example to fine-tune and prompt an adapter in Predibase using the SDK
Don't want to code at all? Use the UI to connect a dataset and start fine-tuning an adapter.
Coming from OpenAI? Check out our migration guides for serving
Explore additional complete examples
See how you Predibase integrates with other frameworks in the ecosystem

Get in touch

Reach out to us at support@predibase.com or join us on Discord for any questions, comments, or feedback!

Quickstart

Run inference using the SDK or REST​

Streaming​

Next steps​

Get in touch​

Run inference using the SDK or REST

Streaming

Next steps

Get in touch