Models
Below is a list of popular OSS models that you can query instantly or deploy on dedicated hardware with Predibase. The models are available via our UI Playground, Python SDK, or REST API.
Inference on our serverless models is billed by token. See pricing
Serverless Endpoints
Deployment Name | Parameters | Architecture | License | Context Window (Max Tokens) | Always On |
---|---|---|---|---|---|
llama-3-8b | 8 billion | Llama-3 | Meta (request for commercial use) | 8192 | Yes |
llama-3-8b-instruct | 8 billion | Llama-3 | Meta (request for commercial use) | 8192 | Yes |
llama-3-70b | 70 billion | Llama-3 | Meta (request for commercial use) | 8192 | Yes |
llama-3-70b-instruct | 70 billion | Llama-3 | Meta (request for commercial use) | 8192 | Yes |
mistral-7b | 7 billion | Mistral | Apache 2.0 | 8000 | Yes |
mistral-7b-instruct | 7 billion | Mistral | Apache 2.0 | 8000 | Yes |
mistral-7b-instruct-v0-2 | 7 billion | Mistral | Apache 2.0 | 8000 | Yes |
mixtral-8x7b-instruct-v0-1 | 46.7 billion | Mixtral | Apache 2.0 | 32768 | Yes |
zephyr-7b-beta | 7 billion | Mistral | MIT | 8000 | Yes |
llama-2-7b | 7 billion | Llama-2 | Meta (request for commercial use) | 4096 | Yes |
llama-2-7b-chat | 7 billion | Llama-2 | Meta (request for commercial use) | 4096 | Yes |
llama-2-13b | 13 billion | Llama-2 | Meta (request for commercial use) | 4096 | No |
llama-2-13b-chat | 13 billion | Llama-2 | Meta (request for commercial use) | 4096 | No |
llama-2-70b | 70 billion | Llama-2 | Meta (request for commercial use) | 4096 | No |
llama-2-70b-chat | 70 billion | Llama-2 | Meta (request for commercial use) | 4096 | No |
codellama-13b-instruct | 13 billion | Llama-2 | Meta (request for commercial use) | 4096 | Yes |
codellama-70b-instruct | 70 billion | Llama-2 | Meta (request for commercial use) | 4096 | No |
gemma-2b | 2 billion | Gemma | 8192 | No | |
gemma-2b-instruct | 2 billion | Gemma | 8192 | No | |
gemma-7b | 7 billion | Gemma | 8192 | No | |
gemma-7b-instruct | 7 billion | Gemma | 8192 | No | |
phi-2 | 2.7 billion | Phi | MIT | 2048 | No |
Note: Models that are not always on scale down to 0 and may have a brief spin up time before serving requests. If you would like us to add support for any serverless endpoints or make any existing endpoints always on, please get in touch on Discord.
Dedicated Deployments
While popular models can be prompted via serverless endpoints, Predibase also offers the ability to spin up deployments on dedicated hardware for nearly any open-source model available. These models fall into two categories:
- Available LLMs: These are models we have first-class support for. These have been verified and are ensured to work well.
- Best-Effort LLMs: These are models that have not been verified and may occasionally not deploy as expected.
Available LLMs
When creating a deployment, you'll need the Huggingface path (below), rather than just the name.
Name | Huggingface Path | Parameters | Architecture | License | Context Window (Max Tokens) |
---|---|---|---|---|---|
mistral-7b | mistralai/Mistral-7B-v0.1 | 7 billion | Mistral | Apache 2.0 | 8000 |
mistral-7b-instruct | mistralai/Mistral-7B-Instruct-v0.1 | 7 billion | Mistral | Apache 2.0 | 8000 |
mistral-7b-instruct-v0-2 | mistralai/Mistral-7B-Instruct-v0.2 | 7 billion | Mistral | Apache 2.0 | 8000 |
mixtral-8x7b | mistralai/Mixtral-8x7B-v0.1 | 46.7 billion | Mixtral | Apache 2.0 | 32768 |
mixtral-8x7b-instruct-v0-1 | mistralai/Mixtral-8x7B-Instruct-v0.1 | 46.7 billion | Mixtral | Apache 2.0 | 32768 |
Mixtral-8x7B-Instruct-v0.1-AWQ | TheBloke/Mixtral-8x7B-Instruct-v0.1-AWQ | 46.7 billion | Mixtral | Apache 2.0 | 32768 |
zephyr-7b-beta | HuggingFaceH4/zephyr-7b-beta | 7 billion | Mistral | MIT | 8000 |
llama-3-8b | meta-llama/Meta-Llama-3-8B | 8 billion | Llama-3 | Meta (request for commercial use) | 8192 |
llama-3-8b-instruct | meta-llama/Meta-Llama-3-8B-Instruct | 8 billion | Llama-3 | Meta (request for commercial use) | 8192 |
llama-3-70b | meta-llama/Meta-Llama-3-70B | 70 billion | Llama-3 | Meta (request for commercial use) | 8192 |
llama-3-70b-instruct | meta-llama/Meta-Llama-3-70B-Instruct | 70 billion | Llama-3 | Meta (request for commercial use) | 8192 |
llama-2-7b | meta-llama/Llama-2-7b-hf | 7 billion | Llama-2 | Meta (request for commercial use) | 4096 |
llama-2-7b-chat | meta-llama/Llama-2-7b-chat-hf | 7 billion | Llama-2 | Meta (request for commercial use) | 4096 |
llama-2-13b | meta-llama/Llama-2-13b-hf | 13 billion | Llama-2 | Meta (request for commercial use) | 4096 |
llama-2-13b-chat | meta-llama/Llama-2-13b-chat-hf | 13 billion | Llama-2 | Meta (request for commercial use) | 4096 |
llama-2-70b | meta-llama/Llama-2-70b-hf | 70 billion | Llama-2 | Meta (request for commercial use) | 4096 |
llama-2-70b-chat | meta-llama/Llama-2-70b-chat-hf | 70 billion | Llama-2 | Meta (request for commercial use) | 4096 |
codellama-7b-instruct | codellama/CodeLlama-7b-instruct-hf | 7 billion | Llama-2 | Meta (request for commercial use) | 4096 |
codellama-13b-instruct | codellama/CodeLlama-13b-instruct-hf | 13 billion | Llama-2 | Meta (request for commercial use) | 4096 |
codellama-34b-instruct | codellama/CodeLlama-34b-instruct-hf | 34 billion | Llama-2 | Meta (request for commercial use) | 4096 |
codellama-70b-instruct | codellama/CodeLlama-70b-Instruct-hf | 70 billion | Llama-2 | Meta (request for commercial use) | 4096 |
gemma-2b | google/gemma-2b | 2 billion | Gemma | 8192 | |
gemma-2b-instruct | google/gemma-2b-it | 2 billion | Gemma | 8192 | |
gemma-7b | google/gemma-7b | 7 billion | Gemma | 8192 | |
gemma-7b-instruct | google/gemma-7b-it | 7 billion | Gemma | 8192 | |
gpt2 | openai-community/gpt2 | 124 million | GPT | MIT | 1024 |
gpt2-medium | openai-community/gpt2-medium | 355 million | GPT | MIT | 1024 |
gpt2-large | openai-community/gpt2-large | 774 million | GPT | MIT | 1024 |
gpt2-xl | openai-community/gpt2-xl | 1.5 billion | GPT | MIT | 1024 |
phi-2 | microsoft/phi-2 | 2.7 billion | Phi | MIT | 2048 |
Best-effort LLMs
Predibase provides best-effort support for any Huggingface LLM meeting the following criteria:
- Uses one of the supported LoRAX architectures
- Has the "Text Generation" and "Transformer" tags
- Does not have a "custom_code" tag
To deploy LLMs with quantization, the quantization method must be supported in LoRAX. Example here. Note that at the moment, we do not support fine-tuning any post-quantized models.
Instruction Templates
The following instruction templates are used in the UI when prompting our serverless deployments. When using the SDK or REST API for inference, you will need to include these templates yourself in the prompt, otherwise you may see less than stellar responses.
Llama 3 models
Instruct models
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a helpful, detailed, and polite artificial intelligence assistant. Your answers are clear and suitable for a professional environment.
If context is provided, answer using only the provided contextual information.<|eot_id|><|start_header_id|>user<|end_header_id|>
<insert your prompt here><|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n
Non-instruct models
None
Llama 2 models
Chat models
<<SYS>>
You are a helpful, detailed, and polite artificial intelligence assistant. Your answers are clear and suitable for a professional environment.
If context is provided, answer using only the provided contextual information.
<</SYS>>
[INST] <insert your prompt here> [/INST]
Non-chat models
None
Codellama models
codellama-13b-instruct
<s>[INST] <insert your prompt here> [/INST]
codellama-70b-instruct
<s>Source: user\n\n <insert your prompt here> <step> Source: assistant\nDestination: user\n\n
Mistral & Mixtral models
<<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>
[INST] <insert your prompt here> [/INST]
Gemma models
Instruct models
<start_of_turn>user
<insert your prompt here><end_of_turn>
<start_of_turn>model
Non-instruct models
None
Phi-2
<|im_start|>user\n<insert your prompt here><|im_end|>\n
Zephyr-7b-beta
<|system|>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.</s>
<|user|>
<insert your prompt here></s>
<|assistant|>