Bring Your Own Model
Predibase allows you to run inference on HuggingFace fine-tuned adapters, merged models, and pretrained models.
Fine-tuned Adapter
To run inference on a fine-tuned adapter from HuggingFace, you can use either:
- a serverless endpoint (shared deployments, priced by tokens) - Recommended for getting started
- a dedicated base model deployment (private instance that you first deploy yourself, priced by $/gpu-hour)
For either base model deployment method, instructions for running inference are the same. You'll need the following:
- deployment name (ex. for a fine-tuned mistral-7b model, the deployment name is "mistral-7b" from our serverless models)
- adapter path from Huggingface (ex. "predibase/tldr_headline_gen" for tldr_headline_gen)
Public Adapter
You can serve a custom finetuned adapter from Huggingface (e.g. tldr_headline_gen) as below:
- Python SDK
- REST
lorax_client = pb.deployments.client("mistral-7b-instruct")
print(lorax_client.generate("The following passage is content from a news report. Please summarize this passage in one sentence or less. Passage: Jeffrey Berns, CEO of Blockchains LLC, wants the Nevada government to allow companies like his to form local governments on land they own, granting them power over everything from schools to law enforcement. Berns envisions a city based on digital currencies and blockchain storage. His company is proposing to build a 15,000 home town 12 miles east of Reno. Nevada Lawmakers have responded with intrigue and skepticism. The proposed legislation has yet to be formally filed or discussed in public hearings. Summary: ",
adapter_id="predibase/tldr_headline_gen",
adapter_source="hub",
max_new_tokens=256
).generated_text)
You'll need the following:
- your Predibase API token (Found on the Settings > My Profile page)
- your tenant ID (Found on the Settings > My Profile page)
curl -d '{"inputs": "What is your name?", "parameters": {"adapter_id": "predibase/tldr_headline_gen", "adapter_source": "hub"}}' \
-H "Content-Type: application/json" \
-X POST https://serving.app.predibase.com/<PREDIBASE TENANT ID>/deployments/v2/llms/<DEPLOYMENT NAME>/generate \
-H "Authorization: Bearer <API TOKEN>"
Private Adapter
To run inference on your private adapter, you'll additionally need the following:
- HuggingFace API token - Token must have write access.
- Python SDK
- REST
lorax_client = pb.deployments.client("mistral-7b-instruct")
print(lorax_client.generate("The following passage is content from a news report. Please summarize this passage in one sentence or less. Passage: Jeffrey Berns, CEO of Blockchains LLC, wants the Nevada government to allow companies like his to form local governments on land they own, granting them power over everything from schools to law enforcement. Berns envisions a city based on digital currencies and blockchain storage. His company is proposing to build a 15,000 home town 12 miles east of Reno. Nevada Lawmakers have responded with intrigue and skepticism. The proposed legislation has yet to be formally filed or discussed in public hearings. Summary: ",
adapter_id="predibase/tldr_headline_gen",
adapter_source="hub",
api_token="<HUGGINGFACE API TOKEN>"
max_new_tokens=256
).generated_text)
You'll need the following:
- your Predibase API token (Found on the Settings > My Profile page)
- your tenant ID (Found on the Settings > My Profile page)
curl -d '{"inputs": "What is my name?", "parameters": {"api_token": "<HUGGINGFACE API TOKEN>", "adapter_source": "hub", "adapter_id": "<HF ORGANIZATION>/<HF ADAPTER NAME>", "max_new_tokens": 128}}' \
-H "Content-Type: application/json" \
-X POST https://serving.app.predibase.com/<PREDIBASE TENANT ID>/deployments/v2/llms/<DEPLOYMENT NAME>/generate \
-H "Authorization: Bearer <PREDIBASE API TOKEN>"
Pretrained model
To serve a custom base model from HuggingFace, you will need to use a dedicated deployment. Verify that it is supported as a "Best-effort LLM" and then deploy the model.
Note that dedicated deployments are billed by $/gpu-hour. (See pricing)
Fine-tuned model with merged weights
If you are looking for a private, dedicated instance of your fine-tuned adapter, we recommend deploying a base model and using LoRAX to run inference on your adapter. LoRAX enables you to serve an unlimited number of adapters on a single base model.
If you would still like to have a dedicated deployment of your fine-tuned model, we are able to serve it for you -- reach out to support@predibase.com.