Bring Your Own Model
Predibase allows you to run inference on HuggingFace fine-tuned adapters, merged models, and pretrained models.
Fine-tuned Adapter
To run inference on a fine-tuned adapter from HuggingFace:
- Get started with a a shared serverless endpoint (shared deployments, free with rate limits)
- Serve in production with a private serverless base model deployment (private instance, priced by $/gpu-hour)
For either base model deployment method, instructions for running inference are the same. You'll need the following:
- deployment name (ex. for a fine-tuned mistral-7b model, the deployment name is "mistral-7b" from our shared serverless models or the name of your private serverless deployment)
- adapter path from Huggingface (ex. "predibase/tldr_headline_gen" for tldr_headline_gen)
Public Adapter
You can serve a custom finetuned adapter from Huggingface (e.g. tldr_headline_gen) as below:
- Python SDK
- REST
lorax_client = pb.deployments.client("mistral-7b-instruct")
print(lorax_client.generate("The following passage is content from a news report. Please summarize this passage in one sentence or less. Passage: Jeffrey Berns, CEO of Blockchains LLC, wants the Nevada government to allow companies like his to form local governments on land they own, granting them power over everything from schools to law enforcement. Berns envisions a city based on digital currencies and blockchain storage. His company is proposing to build a 15,000 home town 12 miles east of Reno. Nevada Lawmakers have responded with intrigue and skepticism. The proposed legislation has yet to be formally filed or discussed in public hearings. Summary: ",
adapter_id="predibase/tldr_headline_gen",
adapter_source="hub",
max_new_tokens=256
).generated_text)
You'll need the following:
- your Predibase API token (Found on the Settings > My Profile page)
- your tenant ID (Found on the Settings > My Profile page)
curl -d '{"inputs": "What is your name?", "parameters": {"adapter_id": "predibase/tldr_headline_gen", "adapter_source": "hub"}}' \
-H "Content-Type: application/json" \
-X POST https://serving.app.predibase.com/<PREDIBASE TENANT ID>/deployments/v2/llms/<DEPLOYMENT NAME>/generate \
-H "Authorization: Bearer <API TOKEN>"
Private Adapter
To run inference on your private adapter, you'll additionally need the following:
- HuggingFace API token - Token must have write access.
- Python SDK
- REST
lorax_client = pb.deployments.client("mistral-7b-instruct")
print(lorax_client.generate("The following passage is content from a news report. Please summarize this passage in one sentence or less. Passage: Jeffrey Berns, CEO of Blockchains LLC, wants the Nevada government to allow companies like his to form local governments on land they own, granting them power over everything from schools to law enforcement. Berns envisions a city based on digital currencies and blockchain storage. His company is proposing to build a 15,000 home town 12 miles east of Reno. Nevada Lawmakers have responded with intrigue and skepticism. The proposed legislation has yet to be formally filed or discussed in public hearings. Summary: ",
adapter_id="predibase/tldr_headline_gen",
adapter_source="hub",
api_token="<HUGGINGFACE API TOKEN>"
max_new_tokens=256
).generated_text)
You'll need the following:
- your Predibase API token (Found on the Settings > My Profile page)
- your tenant ID (Found on the Settings > My Profile page)
curl -d '{"inputs": "What is my name?", "parameters": {"api_token": "<HUGGINGFACE API TOKEN>", "adapter_source": "hub", "adapter_id": "<HF ORGANIZATION>/<HF ADAPTER NAME>", "max_new_tokens": 128}}' \
-H "Content-Type: application/json" \
-X POST https://serving.app.predibase.com/<PREDIBASE TENANT ID>/deployments/v2/llms/<DEPLOYMENT NAME>/generate \
-H "Authorization: Bearer <PREDIBASE API TOKEN>"
Pretrained model
To serve a custom base model from HuggingFace, you will need to use a private serverless deployment. Verify that it is supported as a "Best-effort LLM" and then deploy the model.
Note that private serverless deployments are billed by $/gpu-hour. (See pricing)
Fine-tuned model with merged weights
If you are looking for a private, dedicated instance of your fine-tuned adapter, we recommend deploying a base model and using LoRAX to run inference on your adapter. LoRAX enables you to serve an unlimited number of adapters on a single base model.
If you would still like to have a private serverless deployment of your fine-tuned model, we are able to serve it for you -- reach out to support@predibase.com.