Quickstart
Predibase provides the fastest way to fine-tune and serve open-source LLMs. It's built on top of open-source LoRAX.
- Fine-tuning: Fine-tune and serve a model in just a few steps using the SDK or UI
- Shared endpoints: Try the Python SDK or the Web Playground to prompt serverless endpoints for quick iteration and prototyping
- Production-ready private serverless inference: Deploy your base model to serve an unlimited number of adapters using the SDK or UI
Run inference using the SDK or REST
- Create an account here.
- Navigate to the Settings page and click Generate API Token.
- Setup a venv and install the Python SDK (if running locally - not required for Google Collab or similar):
python3.9 -m venv .venv
source .venv/bin/activate
pip install -U predibase
- See available shared deployments. (Note: VPC customers will need to first deploy a private serverless deployment.)
- Python SDK
- REST
from predibase import Predibase, FinetuningConfig, DeploymentConfig
pb = Predibase(api_token="<PREDIBASE API TOKEN>")
# Optionally get a list of available models by calling pb.deployments.list()
lorax_client = pb.deployments.client("mistral-7b-instruct-v0-2") # Insert deployment name here
resp = lorax_client.generate("[INST] What are some popular tourist spots in San Francisco? [/INST]")
print(resp.generated_text)
# Export environment variables
export PREDIBASE_API_TOKEN="<YOUR TOKEN HERE>" # Settings > My Profile > Generate API Token
export PREDIBASE_TENANT_ID="<YOUR TENANT ID>" # Settings > My Profile > Overview > Tenant ID
export PREDIBASE_DEPLOYMENT="mistral-7b-instruct-v0-2"
# query the LLM deployment
curl -d '{"inputs": "[INST] What are some popular tourist spots in San Francisco? [/INST]"}' \
-H "Content-Type: application/json" \
-X POST https://serving.app.predibase.com/$PREDIBASE_TENANT_ID/deployments/v2/llms/$PREDIBASE_DEPLOYMENT/generate \
-H "Authorization: Bearer ${PREDIBASE_API_TOKEN}"
info
Note the explicit use of special tokens before and after the prompt. These are used with instruction- and chat-tuned models to improve response quality. See Instruction Templates for details on how these should be applied for each of the serverless model endpoints.
Streaming
- Python SDK
- REST
from predibase import Predibase, FinetuningConfig, DeploymentConfig
pb = Predibase(api_token="<PREDIBASE API TOKEN>")
lorax_client = pb.deployments.client("mistral-7b-instruct-v0-2") # Insert deployment name here
for resp in lorax_client.generate_stream("[INST] What are some popular tourist spots in San Francisco? [/INST]"):
if not resp.token.special:
print(resp.token.text, sep="", end="", flush=True)
# Export environment variables
export PREDIBASE_API_TOKEN="<YOUR TOKEN HERE>" # Settings > My Profile > Generate API Token
export PREDIBASE_TENANT_ID="<YOUR TENANT ID>" # Settings > My Profile > Overview > Tenant ID
export PREDIBASE_DEPLOYMENT="mistral-7b-instruct-v0-2"
# query the LLM deployment
curl -d '{"inputs": "[INST] What are some popular tourist spots in San Francisco? [/INST]"}' \
-H "Content-Type: application/json" \
-X POST https://serving.app.predibase.com/$PREDIBASE_TENANT_ID/deployments/v2/llms/$PREDIBASE_DEPLOYMENT/generate_stream \
-H "Authorization: Bearer ${PREDIBASE_API_TOKEN}"
Next steps
- Try out the full example to fine-tune and prompt an adapter in Predibase using the SDK
- Don't want to code at all? Use the UI to connect a dataset and start fine-tuning an adapter.
- Coming from OpenAI? Check out our migration guides for serving
- Explore additional complete examples
- See how you Predibase integrates with other frameworks in the ecosystem
Get in touch
Reach out to us at support@predibase.com or join us on Discord for any questions, comments, or feedback!