llm.deploy
Only VPC and Premium SaaS users with the Admin role will be able to deploy pretrained or finetuned LLMs. Predibase Cloud users will have access to serverless deployments without the need to manage any deployments themselves.
- Deploy Pretrained LLM
- Deploy Fine-tuned LLM
# Pick a Huggingface pretrained LLM to deploy.
# The URI must look something like "hf://meta-llama/Llama-2-7b-hf".
llm = pc.LLM(uri)
# Asynchronous deployment
llm.deploy(deployment_name, engine_template=None)
# Synchronous (blocking) deployment
llm.deploy(...).get()
# Get the fine-tuned llm to deploy using pc.get_model
# For example, pc.get_model(name="Llama-2-7b-hf-code_alpaca_800", version=3)
# returns model version #3 of the model repo named "Llama-2-7b-hf-code_alpaca_800"
model = pc.get_model(model_repo_name, optional_version_number)
# Asynchronous deployment
model.deploy(deployment_name, engine_template=None)
# Synchronous (blocking) deployment
model.deploy(...).get()
This method initiates deployment of your HuggingFaceLLM object in Predibase. Because this operation may take some time, the method by itself is asynchronous. However, users who are interested in immediately tracking deployment progress may follow along by chaining .get()
on the function, which will block and provide incremental logs the operation.
Parameters:
deployment_name: str
The name of the LLM deployment. This name will show up in the UI for your team in query editor and you will use this name when prompting the LLM deployment through the SDK via the prompt method.
NOTE: To ensure your deployment is properly reachable, the deployment name you provide must be RFC1123 subdomain compliant.
engine_template: Optional[str]
The size of engine you want to deploy your LLM with. The current options are llm-gpu-small
and llm-gpu-large
. (Check out Available Engine Templates for more information on engine templates.) If left empty, Predibase will do the appropriate right-sizing and choose a suitable engine for the LLM you intend to deploy.
Returns:
llm.deploy
: A LLMDeploymentJob object
llm.deploy.get
: A LLMDeployment object
Example Usage:
Deploy a pretrained LLM with the name llama-2-13b
and the engine template llm-gpu-large
llm = pc.LLM("hf://meta-llama/Llama-2-13b-hf")
llm_deployment = llm.deploy(deployment_name="llama-2-13b", engine_template="llm-gpu-large").get()
Predibase Deployment Links
After a deployment is initialized, Predibase will create a URL that points to it that will take the form of:
llm_deployment = pc.LLM("pb://deployments/deployment-name")
where deployment-name
is the name you provided in the deploy
command above.
Supported OSS LLMs
The following open source pretrained LLMs are well-supported in Predibase:
- llama-2-7b: "hf://meta-llama/Llama-2-7b-hf"
- llama-2-7b-chat: "hf://meta-llama/Llama-2-7b-chat-hf"
- llama-2-7b-gptq: "hf://TheBloke/Llama-2-7B-GPTQ"
- llama-2-13b: "hf://meta-llama/Llama-2-13b-hf"
- llama-2-13b-chat: "hf://meta-llama/Llama-2-13b-chat-hf"
- llama-2-13b-gptq: "hf://TheBloke/Llama-2-13B-GPTQ"
- vicuna-13b: "hf://eachadea/vicuna-13b-1.1"
- nsql-350m: "hf://NumbersStation/nsql-350M"
- opt-350m: "hf://facebook/opt-350m"
- falcon-7b-instruct: "hf://tiiuae/falcon-7b-instruct"
You may also deploy any Huggingface model that has the "Text Generation" and "Transformer" tags, however you may encounter failures (for example, due to model size). We are continuing to work on adding larger and a more diverse set of LLMs to our supported list.