Serverless Endpoints
VPC customers do not have access to the shared serverless deployments and should start with deploying an LLM.
Prompt Base Models
Predibase supports a variety of base models as serverless deployments. Prompting a base model is as simple as:
Note: When prompting using the SDK or REST API, we recommend including the model-specific instruction template, otherwise you may see less than stellar results. Prompting in the UI includes these templates by default.
- Python SDK
- REST
# Specify the serverless deployment by name
lorax_client = pb.deployments.client("mistral-7b-instruct-v0-2")
print(lorax_client.generate("""<<SYS>>You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.<</SYS>>
[INST] What is the best pizza restaurant in New York? [/INST]""", max_new_tokens=100).generated_text)
# Export environment variables
export PREDIBASE_API_TOKEN="<YOUR TOKEN HERE>" # Settings > My Profile > Generate API Token
export PREDIBASE_TENANT_ID="<YOUR TENANT ID>" # Settings > My Profile > Overview > Tenant ID
export PREDIBASE_DEPLOYMENT="llama-2-7b-chat"
# query the LLM deployment
curl -d '{"inputs": "What is your name?", "parameters": {"max_new_tokens": 256}}' \
-H "Content-Type: application/json" \
-X POST https://serving.app.predibase.com/$PREDIBASE_TENANT_ID/deployments/v2/llms/$PREDIBASE_DEPLOYMENT/generate \
-H "Authorization: Bearer ${PREDIBASE_API_TOKEN}"
Prompt Fine-tuned Models (with LoRAX)
If the base model is one of the available serverless endpoints, you can prompt your fine-tuned model immediately after training with the additional two lines shown below. If the base model is not listed above, you will need to use a dedicated deployment to prompt your fine-tuned model.
- Python SDK
- REST
# Specify the serverless deployment of the base model which was fine-tuned
lorax_client = pb.deployments.client("mistral-7b-instruct-v0-2")
# Specify your adapter_id as "adapter-repo-name/adapter-version-number"
print(lorax_client.generate("""<<SYS>>You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.<</SYS>>
[INST] What is the best pizza restaurant in New York? [/INST]""", adapter_id="adapter-repo-name/1", max_new_tokens=100).generated_text)
# Export environment variables
export PREDIBASE_API_TOKEN="<YOUR TOKEN HERE>" # Settings > My Profile > Generate API Token
export PREDIBASE_TENANT_ID="<YOUR TENANT ID>" # Settings > My Profile > Overview > Tenant ID
export PREDIBASE_DEPLOYMENT="llama-2-7b-chat"
# query the LLM deployment
curl -d '{"inputs": "What is your name?", "parameters": {"api_token": "${PREDIBASE_API_TOKEN}", "adapter_source": "pbase", "adapter_id": "<finetuned_model_repo_name>/<model_version>", "max_new_tokens": 256}}' \
-H "Content-Type: application/json" \
-X POST https://serving.app.predibase.com/$PREDIBASE_TENANT_ID/deployments/v2/llms/$PREDIBASE_DEPLOYMENT/generate \
-H "Authorization: Bearer ${PREDIBASE_API_TOKEN}"
See REST API for more parameters.
Inference on our serverless models is billed by token and there is no upcharge for prompting a fine-tuned adapter! See pricing