Skip to main content

Serverless Endpoints

info

VPC customers do not have access to the shared serverless deployments and should start with deploying an LLM.

Prompt Base Models

Predibase supports a variety of base models as serverless deployments. Prompting a base model is as simple as:

Note: When prompting using the SDK or REST API, we recommend including the model-specific instruction template, otherwise you may see less than stellar results. Prompting in the UI includes these templates by default.

# Specify the serverless deployment by name
lorax_client = pb.deployments.client("mistral-7b-instruct-v0-2")
print(lorax_client.generate("""<<SYS>>You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.<</SYS>>

[INST] What is the best pizza restaurant in New York? [/INST]""", max_new_tokens=100).generated_text)

Prompt Fine-tuned Models (with LoRAX)

If the base model is one of the available serverless endpoints, you can prompt your fine-tuned model immediately after training with the additional two lines shown below. If the base model is not listed above, you will need to use a dedicated deployment to prompt your fine-tuned model.

# Specify the serverless deployment of the base model which was fine-tuned
lorax_client = pb.deployments.client("mistral-7b-instruct-v0-2")

# Specify your adapter_id as "adapter-repo-name/adapter-version-number"
print(lorax_client.generate("""<<SYS>>You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.<</SYS>>

[INST] What is the best pizza restaurant in New York? [/INST]""", adapter_id="adapter-repo-name/1", max_new_tokens=100).generated_text)

Inference on our serverless models is billed by token and there is no upcharge for prompting a fine-tuned adapter! See pricing