Skip to main content

Bring Your Own Model

Predibase allows you to run inference on HuggingFace fine-tuned adapters, merged models, and pretrained models.

Fine-tuned Adapter

To run inference on a fine-tuned adapter from HuggingFace, you can use either:

For either base model deployment method, instructions for running inference are the same. You'll need the following:

  • deployment name (ex. for a fine-tuned mistral-7b model, the deployment name is "mistral-7b" from our serverless models)
  • adapter path from Huggingface (ex. "predibase/tldr_headline_gen" for tldr_headline_gen)

Public Adapter

You can serve a custom finetuned adapter from Huggingface (e.g. tldr_headline_gen) as below:

lorax_client = pb.deployments.client("mistral-7b-instruct")

print(lorax_client.generate("The following passage is content from a news report. Please summarize this passage in one sentence or less. Passage: Jeffrey Berns, CEO of Blockchains LLC, wants the Nevada government to allow companies like his to form local governments on land they own, granting them power over everything from schools to law enforcement. Berns envisions a city based on digital currencies and blockchain storage. His company is proposing to build a 15,000 home town 12 miles east of Reno. Nevada Lawmakers have responded with intrigue and skepticism. The proposed legislation has yet to be formally filed or discussed in public hearings. Summary: ",
adapter_id="predibase/tldr_headline_gen",
adapter_source="hub",
max_new_tokens=256
).generated_text)

Private Adapter

To run inference on your private adapter, you'll additionally need the following:

lorax_client = pb.deployments.client("mistral-7b-instruct")

print(lorax_client.generate("The following passage is content from a news report. Please summarize this passage in one sentence or less. Passage: Jeffrey Berns, CEO of Blockchains LLC, wants the Nevada government to allow companies like his to form local governments on land they own, granting them power over everything from schools to law enforcement. Berns envisions a city based on digital currencies and blockchain storage. His company is proposing to build a 15,000 home town 12 miles east of Reno. Nevada Lawmakers have responded with intrigue and skepticism. The proposed legislation has yet to be formally filed or discussed in public hearings. Summary: ",
adapter_id="predibase/tldr_headline_gen",
adapter_source="hub",
api_token="<HUGGINGFACE API TOKEN>"
max_new_tokens=256
).generated_text)

Pretrained model

To serve a custom base model from HuggingFace, you will need to use a dedicated deployment. Verify that it is supported as a "Best-effort LLM" and then deploy the model.

  1. Deploy a base model as a dedicated deployment.
  2. Prompt as normal.

Note that dedicated deployments are billed by $/gpu-hour. (See pricing)

Fine-tuned model with merged weights

If you are looking for a private, dedicated instance of your fine-tuned adapter, we recommend deploying a base model and using LoRAX to run inference on your adapter. LoRAX enables you to serve an unlimited number of adapters on a single base model.

If you would still like to have a dedicated deployment of your fine-tuned model, we are able to serve it for you -- reach out to support@predibase.com.