Models
Predibase offers the ability to spin up private instances of nearly any open-source model available. These models fall into three categories:
- Officially Supported LLMs: These are models we have first-class support, meaning they have been verified and are ensured to work well. They are also available as shared endpoints for non-VPC customers.
- Best-Effort LLMs: These are models that have not been verified and may occasionally not deploy as expected.
- Embedding Models (new ✨): Deploy a private instance of a variety of supported embedding models.
Officially Supported LLMs
Deployment Name | Parameters | Architecture | License | Context Window* | Always On Shared Endpoint** |
---|---|---|---|---|---|
llama-3-2-1b | 1 billion | Llama-3 | Meta (request for commercial use) | 32768 | No |
llama-3-2-1b-instruct | 1 billion | Llama-3 | Meta (request for commercial use) | 32768 | No |
llama-3-2-3b | 3 billion | Llama-3 | Meta (request for commercial use) | 32768 | No |
llama-3-2-3b-instruct | 3 billion | Llama-3 | Meta (request for commercial use) | 32768 | No |
llama-3-1-8b | 8 billion | Llama-3 | Meta (request for commercial use) | 63999 | No |
llama-3-1-8b-instruct | 8 billion | Llama-3 | Meta (request for commercial use) | 63999 | ✅ Yes |
llama-3-8b | 8 billion | Llama-3 | Meta (request for commercial use) | 8192 | No |
llama-3-8b-instruct | 8 billion | Llama-3 | Meta (request for commercial use) | 8192 | No |
llama-3-70b | 70 billion | Llama-3 | Meta (request for commercial use) | 8192 | No |
llama-3-70b-instruct | 70 billion | Llama-3 | Meta (request for commercial use) | 8192 | No |
llama-2-7b | 7 billion | Llama-2 | Meta (request for commercial use) | 4096 | No |
llama-2-7b-chat | 7 billion | Llama-2 | Meta (request for commercial use) | 4096 | No |
llama-2-13b | 13 billion | Llama-2 | Meta (request for commercial use) | 4096 | No |
llama-2-13b-chat | 13 billion | Llama-2 | Meta (request for commercial use) | 4096 | No |
llama-2-70b | 70 billion | Llama-2 | Meta (request for commercial use) | 4096 | No |
llama-2-70b-chat | 70 billion | Llama-2 | Meta (request for commercial use) | 4096 | No |
codellama-7b | 7 billion | Llama-2 | Meta (request for commercial use) | 4096 | No |
codellama-7b-instruct | 7 billion | Llama-2 | Meta (request for commercial use) | 4096 | No |
codellama-13b-instruct | 13 billion | Llama-2 | Meta (request for commercial use) | 4096 | No |
codellama-70b-instruct | 70 billion | Llama-2 | Meta (request for commercial use) | 4096 | No |
mistral-7b | 7 billion | Mistral | Apache 2.0 | 32768 | No |
mistral-7b-instruct | 7 billion | Mistral | Apache 2.0 | 32768 | No |
mistral-7b-instruct-v0-2 | 7 billion | Mistral | Apache 2.0 | 32768 | ✅ Yes |
mistral-7b-instruct-v0-3 | 7 billion | Mistral | Apache 2.0 | 32768 | No |
mistral-nemo-12b-2407 | 12 billion | Mistral | Apache 2.0 | 65536 | No |
mistral-nemo-12b-instruct-2407 | 12 billion | Mistral | Apache 2.0 | 65536 | No |
mixtral-8x7b-v0-1 | 46.7 billion | Mixtral | Apache 2.0 | 32768 | No |
mixtral-8x7b-instruct-v0-1 | 46.7 billion | Mixtral | Apache 2.0 | 32768 | No |
solar-1-mini-chat-240612 | 10.7 billion | Solar | Custom License | 32768 | ✅ Yes |
solar-pro-preview-instruct | 22.1 billion | Solar | Custom License | 4096 | ✅ Yes |
zephyr-7b-beta | 7 billion | Mistral | MIT | 8000 | No |
phi-2 | 2.7 billion | Phi-2 | MIT | 2048 | No |
phi-3-mini-4k-instruct | 3.8 billion | Phi-3 | MIT | 4096 | No |
phi-3-5-mini-instruct | 3.8 billion | Phi-3 | MIT | 65536 | No |
gemma-2b | 2.5 billion | Gemma | 8192 | No | |
gemma-2b-instruct | 2.5 billion | Gemma | 8192 | No | |
gemma-7b | 8.5 billion | Gemma | 8192 | No | |
gemma-7b-instruct | 8.5 billion | Gemma | 8192 | No | |
gemma-2-9b | 9.24 billion | Gemma | 8192 | No | |
gemma-2-9b-instruct | 9.24 billion | Gemma | 8192 | No | |
gemma-2-27b | 27.2 billion | Gemma | 8192 | No | |
gemma-2-27b-instruct | 27.2 billion | Gemma | 8192 | No | |
qwen2-5-1-5b | 1.5 billion | Qwen | Tongyi Qianwen | 65536 | No |
qwen2-5-1-5b-instruct | 1.5 billion | Qwen | Tongyi Qianwen | 65536 | No |
qwen2-5-7b | 7 billion | Qwen | Tongyi Qianwen | 65536 | No |
qwen2-5-7b-instruct | 7 billion | Qwen | Tongyi Qianwen | 65536 | No |
qwen2-5-14b | 14 billion | Qwen | Tongyi Qianwen | 32768 | No |
qwen2-5-14b-instruct | 14 billion | Qwen | Tongyi Qianwen | 32768 | No |
qwen2-1-5b | 7.62 billion | Qwen | Tongyi Qianwen | 65536 | No |
qwen2-1-5b-instruct | 7.62 billion | Qwen | Tongyi Qianwen | 65536 | No |
qwen2-7b | 7.62 billion | Qwen | Tongyi Qianwen | 131072 | No |
qwen2-7b-instruct | 7.62 billion | Qwen | Tongyi Qianwen | 131072 | No |
qwen2-72b | 72.7 billion | Qwen | Tongyi Qianwen | 131072 | No |
qwen2-72b-instruct | 72.7 billion | Qwen | Tongyi Qianwen | 131072 | No |
*These context windows are well supported when using an A100 GPU. When using a smaller GPU, you may not be able to get the full context window.
**By default, all supported LLMs are availabled as shared endpoints. Models that are not "Always On" scale down to 0 and may have a brief spin up time before serving requests.
Best-effort LLMs
Predibase provides best-effort support for any Huggingface LLM meeting the following criteria:
- Uses one of the supported LoRAX architectures
- Has the "Text Generation" and "Transformer" tags
- Does not have a "custom_code" tag
How to Deploy a Custom LLM
- Get the Huggingface ID for your model by clicking the the copy icon on the custom base model's page, ex. "BioMistral/BioMistral-7B".
- Pass the Huggingface ID as the
base_model
, the appropriate accelerator ID foraccelerator
based on your tier or contract, andhf_token
(your Huggingface token) if deploying a private model.
pb.deployments.create(
name="my-biomistral-7b",
config=DeploymentConfig(
base_model="BioMistral/BioMistral-7B",
accelerator="a100_80gb_100", # Required for custom models
# hf_token="<YOUR HUGGINGFACE TOKEN>" # Required for private Huggingface models
# cooldown_time=3600, # Value in seconds, defaults to 3600 (1hr)
min_replicas=0, # Auto-scales to 0 replicas when not in use, set to 1 for an always-on deployment
max_replicas=1
custom_args=[ # Optional, see DeploymentConfig reference for further details
"--max-input-length", "1000",
"--max-batch-prefill-tokens", "100",
"--max-total-tokens", "2048"
]
)
)
By default, we set safe defaults for --max-total-tokens
for custom models, so if you'd like a larger context window per your model, be sure to set the --max-total-tokens
in the custom_args
.
- Prompt your adapter as normal.
Instruction Templates
The following instruction templates are used in the UI when prompting our shared endpoints. When using the SDK or REST API for inference, you will need to include these templates yourself in the prompt, otherwise you may see less than stellar responses.
Llama 3 models
Instruct models
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a helpful, detailed, and polite artificial intelligence assistant. Your answers are clear and suitable for a professional environment.
If context is provided, answer using only the provided contextual information.<|eot_id|><|start_header_id|>user<|end_header_id|>
<insert your prompt here><|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n
Non-instruct models
None
Llama 2 models
Chat models
<<SYS>>
You are a helpful, detailed, and polite artificial intelligence assistant. Your answers are clear and suitable for a professional environment.
If context is provided, answer using only the provided contextual information.
<</SYS>>
[INST] <insert your prompt here> [/INST]
Non-chat models
None
Codellama models
codellama-13b-instruct
<s>[INST] <insert your prompt here> [/INST]
codellama-70b-instruct
<s>Source: user\n\n <insert your prompt here> <step> Source: assistant\nDestination: user\n\n
Mistral & Mixtral models
<<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>
[INST] <insert your prompt here> [/INST]
Solar models
Instruct models
<|im_start|>user\n <insert your prompt here> <|im_end|>\n<|im_start|>assistant\n
Non-instruct models
None.
Gemma models
Instruct models
<start_of_turn>user
<insert your prompt here><end_of_turn>
<start_of_turn>model
Non-instruct models
None
Phi-2
<|im_start|>user\n<insert your prompt here><|im_end|>\n
Zephyr-7b-beta
<|system|>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.</s>
<|user|>
<insert your prompt here></s>
<|assistant|>