Skip to main content

Bring Your Own Model

Predibase allows you to run inference on HuggingFace fine-tuned adapters, merged models, and pretrained models.

Fine-tuned Adapter

To run inference on a fine-tuned adapter from HuggingFace, you can use either:

For either base model deployment method, instructions for running inference are the same. You'll need the following:

  • deployment name (ex. for a fine-tuned mistral-7b model, the deployment name is "mistral-7b" from our serverless models)
  • adapter path from Huggingface (ex. "predibase/tldr_headline_gen" for tldr_headline_gen)

Public Adapter

You can serve a custom finetuned adapter from Huggingface (e.g. tldr_headline_gen) as below:

llm = pc.LLM("pb://deployments/<DEPLOYMENT NAME>") # select your base model
adapter = pc.LLM("hf://predibase/tldr_headline_gen") # select your adapter from HF
ft_llm = llm.with_adapter(adapter) # attach your adapter to your base model

result = ft_llm.prompt("The following passage is content from a news report. Please summarize this passage in one sentence or less. Passage: Jeffrey Berns, CEO of Blockchains LLC, wants the Nevada government to allow companies like his to form local governments on land they own, granting them power over everything from schools to law enforcement. Berns envisions a city based on digital currencies and blockchain storage. His company is proposing to build a 15,000 home town 12 miles east of Reno. Nevada Lawmakers have responded with intrigue and skepticism. The proposed legislation has yet to be formally filed or discussed in public hearings. Summary: ", max_new_tokens=256)
print(result.response)

Private Adapter

To run inference on your private adapter, you'll additionally need the following:

You'll need the following:

  • your Predibase API token (Found on the Settings > My Profile page)
  • your tenant ID (Found on the Settings > My Profile page)
curl -d '{"inputs": "What is my name?", "parameters": {"api_token": "<HUGGINGFACE API TOKEN>", "adapter_source": "hub", "adapter_id": "<HF ORGANIZATION>/<HF ADAPTER NAME>", "max_new_tokens": 128}}' \
-H "Content-Type: application/json" \
-X POST https://serving.app.predibase.com/<PREDIBASE TENANT ID>/deployments/v2/llms/<DEPLOYMENT NAME>/generate \
-H "Authorization: Bearer <PREDIBASE API TOKEN>"

Pretrained model or Fine-tuned model with merged weights

To serve a custom base or fine-tuned model from HuggingFace, you will need to use a dedicated deployment. Verify that it is supported and then deploy the model.

llm = pc.LLM("hf://mistralai/Mistral-7B-Instruct-v0.2")

# Public model
llm_deployment = llm.deploy(deployment_name="mistral-7b-dedicated").get()

# Private model, must include hf_token (which must have write access)
llm_deployment = llm.deploy(deployment_name="mistral-7b-dedicated", hf_token={HUGGINGFACE_TOKEN}).get()

Then, prompt as normal.

Note that dedicated deployments are billed by $/gpu-hour. (See pricing)