Skip to main content

Dedicated Deployments

Deploying Base Models

Private instances are helpful if you’re expecting significant request traffic. Predibase supports the serving of base models from Huggingface. See available models. Dedicated deployments are billed by gpu-time.

info

For the base_model, you'll need the Huggingface path, which can be found here for the models we officially support.

pb.deployments.create(
name="my-mistral-7b-instruct",
config=DeploymentConfig(
base_model="mistralai/Mistral-7B-Instruct-v0.2",
# cooldown_time=3600 # Value in seconds, defaults to 0 which means deployment is always on
)
# description="", # Optional
)

Note: Dedicated deployments are always on by default. To change this, modify the cooldown_time parameter.

Dedicated fine-tuned adapter deployments

If you are looking for a private, dedicated instance of your fine-tuned adapter, we recommend deploying a base model (above) and using LoRAX to run inference on your adapter. LoRAX enables you to serve an unlimited number of adapters on a single base model.

If you would still like to have a dedicated deployment of your fine-tuned model, we are able to serve it for you -- reach out to support@predibase.com.

Customize Compute

By default, Predibase will do automatic right-sizing to choose a suitable accelerator for the LLM you intend to deploy. You may also use a specific accelerator if you'd like.

pb.deployments.create(
name="my-mistral-7b-instruct",
config=DeploymentConfig(
base_model="mistralai/Mistral-7B-Instruct-v0.2",
accelerator="a10_24gb_100",
# cooldown_time=3600 # Value in seconds, defaults to 0 which means deployment is always on
)
)

Available Accelerators

Certain accelerators are only available for certain tiers. Please reach out to sales@predibase.com if you're interested in upgrading to Enterprise or deploying on different hardware. See our pricing.

AcceleratorIDPredibase TiersGPUsGPU SKU
1 A10G 24GBa10_24gb_100Developer1A10G
4 A10G 24GBa10_24gb_400Enterprise (VPC)4A10G
0.25 A100 80GBa100_80gb_025Enterprise (Predibase AI Cloud)0.25A100
0.5 A100 80GBa100_80gb_050Enterprise (Predibase AI Cloud)0.50A100
1 A100 80GBa100_80gb_100Enterprise (Predibase AI Cloud)1A100

If you would like to upgrade your subscription tier to enterprise, please reach out to us at sales@predibase.com.

Prompting

Prompt the Base Model

Dedicated LLMs can be prompted via the Python SDK or REST API once they have been deployed.

# Specify the serverless deployment by name
lorax_client = pb.deployments.client("mistral-7b-instruct-v0-2")
print(lorax_client.generate("""<<SYS>>You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.<</SYS>>

[INST] What is the best pizza restaurant in New York? [/INST]""", max_new_tokens=100).generated_text)

Prompt a Fine-Tuned Adapter (with LoRAX)

# Specify the serverless deployment of the base model which was fine-tuned
lorax_client = pb.deployments.client("mistral-7b-instruct-v0-2")

# Specify your adapter_id as "adapter-repo-name/adapter-version-number"
print(lorax_client.generate("""<<SYS>>You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.<</SYS>>

[INST] What is the best pizza restaurant in New York? [/INST]""", adapter_id="adapter-repo-name/1", max_new_tokens=100).generated_text)

Delete a Deployment

Deployments can be deleted via the SDK to free up compute resources.

pb.deployments.delete("my-mistral-7b")

Other helpful methods

  • List LLM Deployments - Method for fetching a list of LLM deployments
  • Get LLM Status - Method used for checking in your deployment status and see if it is ready for prompting