Skip to main content

Models

Predibase offers the ability to spin up private instances of nearly any open-source model available. These models fall into three categories:

  1. Officially Supported LLMs: These are models we have first-class support, meaning they have been verified and are ensured to work well. They are also available as shared endpoints for non-VPC customers.
  2. Best-Effort LLMs: These are models that have not been verified and may occasionally not deploy as expected.
  3. Embedding Models (new ✨): Deploy a private instance of a variety of supported embedding models.

Officially Supported LLMs

Large Language Models (Text)

Deployment NameParametersArchitectureLicenseContext Window*Always On Shared Endpoint**
llama-3-2-1b1 billionLlama-3Meta (request for commercial use)32768No
llama-3-2-1b-instruct1 billionLlama-3Meta (request for commercial use)32768No
llama-3-2-3b3 billionLlama-3Meta (request for commercial use)32768No
llama-3-2-3b-instruct3 billionLlama-3Meta (request for commercial use)32768No
llama-3-1-8b8 billionLlama-3Meta (request for commercial use)63999No
llama-3-1-8b-instruct8 billionLlama-3Meta (request for commercial use)63999✅ Yes
llama-3-8b8 billionLlama-3Meta (request for commercial use)8192No
llama-3-8b-instruct8 billionLlama-3Meta (request for commercial use)8192No
llama-3-70b70 billionLlama-3Meta (request for commercial use)8192No
llama-3-70b-instruct70 billionLlama-3Meta (request for commercial use)8192No
llama-2-7b7 billionLlama-2Meta (request for commercial use)4096No
llama-2-7b-chat7 billionLlama-2Meta (request for commercial use)4096No
llama-2-13b13 billionLlama-2Meta (request for commercial use)4096No
llama-2-13b-chat13 billionLlama-2Meta (request for commercial use)4096No
llama-2-70b70 billionLlama-2Meta (request for commercial use)4096No
llama-2-70b-chat70 billionLlama-2Meta (request for commercial use)4096No
codellama-7b7 billionLlama-2Meta (request for commercial use)4096No
codellama-7b-instruct7 billionLlama-2Meta (request for commercial use)4096No
codellama-13b-instruct13 billionLlama-2Meta (request for commercial use)4096No
codellama-70b-instruct70 billionLlama-2Meta (request for commercial use)4096No
mistral-7b7 billionMistralApache 2.032768No
mistral-7b-instruct7 billionMistralApache 2.032768No
mistral-7b-instruct-v0-27 billionMistralApache 2.032768✅ Yes
mistral-7b-instruct-v0-37 billionMistralApache 2.032768No
mistral-nemo-12b-240712 billionMistralApache 2.065536No
mistral-nemo-12b-instruct-240712 billionMistralApache 2.065536No
mixtral-8x7b-v0-146.7 billionMixtralApache 2.032768No
mixtral-8x7b-instruct-v0-146.7 billionMixtralApache 2.032768No
solar-1-mini-chat-24061210.7 billionSolarCustom License32768✅ Yes
solar-pro-preview-instruct22.1 billionSolarCustom License4096✅ Yes
zephyr-7b-beta7 billionMistralMIT8000No
phi-22.7 billionPhi-2MIT2048No
phi-3-mini-4k-instruct3.8 billionPhi-3MIT4096No
phi-3-5-mini-instruct3.8 billionPhi-3MIT65536No
gemma-2b2.5 billionGemmaGoogle8192No
gemma-2b-instruct2.5 billionGemmaGoogle8192No
gemma-7b8.5 billionGemmaGoogle8192No
gemma-7b-instruct8.5 billionGemmaGoogle8192No
gemma-2-9b9.24 billionGemmaGoogle8192No
gemma-2-9b-instruct9.24 billionGemmaGoogle8192No
gemma-2-27b27.2 billionGemmaGoogle8192No
gemma-2-27b-instruct27.2 billionGemmaGoogle8192No
qwen2-5-1-5b1.5 billionQwenTongyi Qianwen65536No
qwen2-5-1-5b-instruct1.5 billionQwenTongyi Qianwen65536No
qwen2-5-7b7 billionQwenTongyi Qianwen65536No
qwen2-5-7b-instruct7 billionQwenTongyi Qianwen65536No
qwen2-5-14b14 billionQwenTongyi Qianwen32768No
qwen2-5-14b-instruct14 billionQwenTongyi Qianwen32768No
qwen2-5-32b32 billionQwenTongyi Qianwen16384No
qwen2-5-32b-instruct32 billionQwenTongyi Qianwen16384No
qwen2-1-5b7.62 billionQwenTongyi Qianwen65536No
qwen2-1-5b-instruct7.62 billionQwenTongyi Qianwen65536No
qwen2-7b7.62 billionQwenTongyi Qianwen131072No
qwen2-7b-instruct7.62 billionQwenTongyi Qianwen131072No
qwen2-72b72.7 billionQwenTongyi Qianwen131072No
qwen2-72b-instruct72.7 billionQwenTongyi Qianwen131072No

*These context windows are well supported when using an A100 GPU. When using a smaller GPU, you may not be able to get the full context window.

**By default, all supported LLMs are availabled as shared endpoints. Models that are not "Always On" scale down to 0 and may have a brief spin up time before serving requests.

Visual Language Models

Deployment NameParametersArchitectureLicenseContext Window*Always On Shared Endpoint**
llama-3-2-11b-vision10.7 billionMllamaMeta (request for commercial use)32768No
llama-3-2-11b-vision-instruct10.7 billionMllamaMeta (request for commercial use)32768No

*These context windows are well supported when using an A100 GPU. When using a smaller GPU, you may not be able to get the full context window.

**By default, all supported VLMs are not available as shared endpoints. We will be adding support for VLMs as shared endpoints in the future. For now, you can created a private deployment to test base model or fine-tuned inference.

Best-effort LLMs

Predibase provides best-effort support for any Huggingface LLM meeting the following criteria:

How to Deploy a Custom LLM

  1. Get the Huggingface ID for your model by clicking the the copy icon on the custom base model's page, ex. "BioMistral/BioMistral-7B".

Huggingface screenshot

  1. Pass the Huggingface ID as the base_model, the appropriate accelerator ID for accelerator based on your tier or contract, and hf_token (your Huggingface token) if deploying a private model.
pb.deployments.create(
name="my-biomistral-7b",
config=DeploymentConfig(
base_model="BioMistral/BioMistral-7B",
accelerator="a100_80gb_100", # Required for custom models
# hf_token="<YOUR HUGGINGFACE TOKEN>" # Required for private Huggingface models
# cooldown_time=3600, # Value in seconds, defaults to 3600 (1hr)
min_replicas=0, # Auto-scales to 0 replicas when not in use, set to 1 for an always-on deployment
max_replicas=1
custom_args=[ # Optional, see DeploymentConfig reference for further details
"--max-input-length", "1000",
"--max-batch-prefill-tokens", "100",
"--max-total-tokens", "2048"
]
)
)
caution

By default, we set safe defaults for --max-total-tokens for custom models, so if you'd like a larger context window per your model, be sure to set the --max-total-tokens in the custom_args.

  1. Prompt your adapter as normal.

Instruction Templates

The following instruction templates are used in the UI when prompting our shared endpoints. When using the SDK or REST API for inference, you will need to include these templates yourself in the prompt, otherwise you may see less than stellar responses.

Llama 3 models

Instruct models

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a helpful, detailed, and polite artificial intelligence assistant. Your answers are clear and suitable for a professional environment.

If context is provided, answer using only the provided contextual information.<|eot_id|><|start_header_id|>user<|end_header_id|>

<insert your prompt here><|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n

Non-instruct models

None

Llama 2 models

Chat models

<<SYS>>
You are a helpful, detailed, and polite artificial intelligence assistant. Your answers are clear and suitable for a professional environment.

If context is provided, answer using only the provided contextual information.
<</SYS>>

[INST] <insert your prompt here> [/INST]

Non-chat models

None

Codellama models

codellama-13b-instruct

<s>[INST] <insert your prompt here> [/INST]

codellama-70b-instruct

<s>Source: user\n\n <insert your prompt here> <step> Source: assistant\nDestination: user\n\n

Mistral & Mixtral models

<<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

[INST] <insert your prompt here> [/INST]

Solar models

Instruct models

<|im_start|>user\n <insert your prompt here> <|im_end|>\n<|im_start|>assistant\n

Non-instruct models

None.

Gemma models

Instruct models

<start_of_turn>user
<insert your prompt here><end_of_turn>
<start_of_turn>model

Non-instruct models

None

Phi-2

<|im_start|>user\n<insert your prompt here><|im_end|>\n

Zephyr-7b-beta

<|system|>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.</s>
<|user|>
<insert your prompt here></s>
<|assistant|>