Deploy a Pretrained LLM
Predibase supports deploying any pretrained Large Language Model hosted on the HuggingFace Hub to a hosted endpoint for real-time inference.
Only VPC users with the Admin role will be able to deploy a pretrained LLM. Predibase Cloud users will have access to shared deployments without the need to manage any deployments themselves.
Deploying via SDK
Currently, we support using the Python SDK / Command Line Interface (CLI) to manage LLM deployments. A UI version is in the works and will be available soon.
Prerequisites:
Install the Predibase Python SDK
Select a pretrained text generation model from the HuggingFace Hub.
To deploy an LLM to a hosted endpoint:
- Python SDK
- CLI
from predibase import PredibaseClient
pc = PredibaseClient()
llm = pc.LLM("hf://meta-llama/Llama-2-7b-chat-hf")
deployment = llm.deploy("my-first-llm").get()
pbase deploy llm --deployment-name my-first-llm --model-name google/flan-t5-xl
Testing the deployment
While LLMs can be queried via the UI, the SDK can be used to programmatically query and test the the deployment is up and ready for use.
- Python SDK
- CLI
from predibase import PredibaseClient
pc = PredibaseClient()
deployment = pc.LLM("pb://deployments/my-first-llm")
result = deployment.prompt("What is the capital of Italy?")
pbase prompt llm -t "What is the capital of Italy?" --model-name my-first-llm
Selecting an Engine Template
By default, your LLM will be deployed on an engine with a single Nvidia A10G GPU. This will be sufficient for serving most LLMs under 10 billion parameters. For larger models, you will want to upgrade to a larger engine type.
To deploy an LLM using a specific engine template:
- Python SDK
- CLI
from predibase import PredibaseClient
pc = PredibaseClient()
llm = pc.LLM("hf://meta-llama/Llama-2-7b-chat-hf")
deployment = llm.deploy("my-first-llm", engine_template="llm-gpu-large").get()
pbase deploy llm --deployment-name my-first-llm --model-name google/flan-t5-xl --engine-template llm-gpu-large
Available Engine Templates
Engine Template | GPUs | GPU SKU | vCPUs | RAM | Disk |
---|---|---|---|---|---|
llm-gpu-small | 1 | A10G | 7810m | 29217Mi | 100Gi |
llm-gpu-large | 4 | A10G | 47710m | 173300Mi | 400Gi |
Delete a Deployment
Deployments can be deleted via the SDK to free up compute resources:
- Python SDK
- CLI
from predibase import PredibaseClient
pc = PredibaseClient()
pc.LLM("pb://deployments/my-first-llm").delete()
pbase delete llm --deployment-name my-first-llm