Skip to main content



Only VPC and Premium SaaS users with the Admin role will be able to deploy pretrained or finetuned LLMs. Predibase Cloud users will have access to serverless deployments without the need to manage any deployments themselves.

# Pick a Huggingface pretrained LLM to deploy.
# The URI must look something like "hf://meta-llama/Llama-2-7b-hf".
llm = pc.LLM(uri)

# Asynchronous deployment
llm.deploy(deployment_name, engine_template=None)

# Synchronous (blocking) deployment

This method initiates deployment of your HuggingFaceLLM object in Predibase. Because this operation may take some time, the method by itself is asynchronous. However, users who are interested in immediately tracking deployment progress may follow along by chaining .get() on the function, which will block and provide incremental logs the operation.


deployment_name: str

The name of the LLM deployment. This name will show up in the UI for your team in query editor and you will use this name when prompting the LLM deployment through the SDK via the prompt method.

NOTE: To ensure your deployment is properly reachable, the deployment name you provide must be RFC1123 subdomain compliant.

engine_template: Optional[str]

The size of engine you want to deploy your LLM with. The current options are llm-gpu-small and llm-gpu-large. (Check out Available Engine Templates for more information on engine templates.) If left empty, Predibase will do the appropriate right-sizing and choose a suitable engine for the LLM you intend to deploy.


llm.deploy: A LLMDeploymentJob object
llm.deploy.get: A LLMDeployment object

Example Usage:

Deploy a pretrained LLM with the name llama-2-13b and the engine template llm-gpu-large

llm = pc.LLM("hf://meta-llama/Llama-2-13b-hf")
llm_deployment = llm.deploy(deployment_name="llama-2-13b", engine_template="llm-gpu-large").get()

After a deployment is initialized, Predibase will create a URL that points to it that will take the form of:

llm_deployment = pc.LLM("pb://deployments/deployment-name")

where deployment-name is the name you provided in the deploy command above.

Supported OSS LLMs

The following open source pretrained LLMs are well-supported in Predibase:

  • llama-2-7b: "hf://meta-llama/Llama-2-7b-hf"
  • llama-2-7b-chat: "hf://meta-llama/Llama-2-7b-chat-hf"
  • llama-2-7b-gptq: "hf://TheBloke/Llama-2-7B-GPTQ"
  • llama-2-13b: "hf://meta-llama/Llama-2-13b-hf"
  • llama-2-13b-chat: "hf://meta-llama/Llama-2-13b-chat-hf"
  • llama-2-13b-gptq: "hf://TheBloke/Llama-2-13B-GPTQ"
  • vicuna-13b: "hf://eachadea/vicuna-13b-1.1"
  • nsql-350m: "hf://NumbersStation/nsql-350M"
  • opt-350m: "hf://facebook/opt-350m"
  • falcon-7b-instruct: "hf://tiiuae/falcon-7b-instruct"

You may also deploy any Huggingface model that has the "Text Generation" and "Transformer" tags, however you may encounter failures (for example, due to model size). We are continuing to work on adding larger and a more diverse set of LLMs to our supported list.