Bring Your Own Model
Predibase allows you to bring your own models and adapters from your local machine or external repositories like HuggingFace.
Upload a custom adapter from local
Use the Predibase SDK to upload your adapter to Predibase:
pb.adapters.upload("/path/to/adapter", repo="my_repo", base_model="llama3-1-8b")
Note that the base_model
here should ideally map to one of Predibase's officially supported models for the best experience. If your adapter's base model is not on this list, you can provide the HuggingFace model ID instead.
The adapter path on your local machine is expected to follow the PEFT format containing the following files:
/path/to/adapter
/adapter_config.json
/adapter_model.safetensors
To run inference on your uploaded adapter:
- Get started with a shared endpoint (public endpoints, free with rate limits)
- Serve in production with a private serverless deployment (private instance, priced by $/gpu-hour)
For either base model deployment method, instructions for running inference are the same. You'll need the following:
- deployment name (ex. for a fine-tuned mistral-7b model, the deployment name is "mistral-7b" from our shared models or the name of your private serverless deployment)
- adapter repo and version in Predibase (ex. "my_repo/1" for the example above)
- Python SDK
- REST
lorax_client = pb.deployments.client("llama-3-1-8b")
print(lorax_client.generate("...",
adapter_id="my_repo/1",
max_new_tokens=256
).generated_text)
You'll need the following:
- your Predibase API token (Found on the Settings > My Profile page)
- your tenant ID (Found on the Settings > My Profile page)
curl -d '{"inputs": "...", "parameters": {"adapter_id": "my_repo/1"}}' \
-H "Content-Type: application/json" \
-X POST https://serving.app.predibase.com/<PREDIBASE TENANT ID>/deployments/v2/llms/<DEPLOYMENT NAME>/generate \
-H "Authorization: Bearer <API TOKEN>"
Prompt an adapter from HuggingFace
You can prompt a custom fine-tuned adapter from Huggingface (e.g. tldr_headline_gen) as below:
- Python SDK
- REST
lorax_client = pb.deployments.client("mistral-7b-instruct")
print(lorax_client.generate("The following passage is content from a news report. Please summarize this passage in one sentence or less. Passage: Jeffrey Berns, CEO of Blockchains LLC, wants the Nevada government to allow companies like his to form local governments on land they own, granting them power over everything from schools to law enforcement. Berns envisions a city based on digital currencies and blockchain storage. His company is proposing to build a 15,000 home town 12 miles east of Reno. Nevada Lawmakers have responded with intrigue and skepticism. The proposed legislation has yet to be formally filed or discussed in public hearings. Summary: ",
adapter_id="predibase/tldr_headline_gen",
adapter_source="hub",
max_new_tokens=256
).generated_text)
You'll need the following:
- your Predibase API token (Found on the Settings > My Profile page)
- your tenant ID (Found on the Settings > My Profile page)
curl -d '{"inputs": "What is your name?", "parameters": {"adapter_id": "predibase/tldr_headline_gen", "adapter_source": "hub"}}' \
-H "Content-Type: application/json" \
-X POST https://serving.app.predibase.com/<PREDIBASE TENANT ID>/deployments/v2/llms/<DEPLOYMENT NAME>/generate \
-H "Authorization: Bearer <API TOKEN>"
Private adapters
To run inference on your private adapter, you'll additionally need the following:
- HuggingFace API token - Token must have write access.
- Python SDK
- REST
lorax_client = pb.deployments.client("mistral-7b-instruct")
print(lorax_client.generate("The following passage is content from a news report. Please summarize this passage in one sentence or less. Passage: Jeffrey Berns, CEO of Blockchains LLC, wants the Nevada government to allow companies like his to form local governments on land they own, granting them power over everything from schools to law enforcement. Berns envisions a city based on digital currencies and blockchain storage. His company is proposing to build a 15,000 home town 12 miles east of Reno. Nevada Lawmakers have responded with intrigue and skepticism. The proposed legislation has yet to be formally filed or discussed in public hearings. Summary: ",
adapter_id="predibase/tldr_headline_gen",
adapter_source="hub",
api_token="<HUGGINGFACE API TOKEN>"
max_new_tokens=256
).generated_text)
You'll need the following:
- your Predibase API token (Found on the Settings > My Profile page)
- your tenant ID (Found on the Settings > My Profile page)
curl -d '{"inputs": "What is my name?", "parameters": {"api_token": "<HUGGINGFACE API TOKEN>", "adapter_source": "hub", "adapter_id": "<HF ORGANIZATION>/<HF ADAPTER NAME>", "max_new_tokens": 128}}' \
-H "Content-Type: application/json" \
-X POST https://serving.app.predibase.com/<PREDIBASE TENANT ID>/deployments/v2/llms/<DEPLOYMENT NAME>/generate \
-H "Authorization: Bearer <PREDIBASE API TOKEN>"
Deploy a custom model from HuggingFace
To serve a custom base model from HuggingFace, you will need to use a private serverless deployment. Verify that it is supported as a "Best-effort LLM" and then deploy the model.
Note that private serverless deployments are billed by $/gpu-hour. (See pricing)
If you would like to have a private serverless deployment of a custom model we don't yet support, we'll get it working for you -- reach out to support@predibase.com.