Skip to main content


Below is a list of popular OSS models that you can query instantly or deploy on dedicated hardware with Predibase. The models are available via our UI Playground, Python SDK, or REST API.


When referring to an existing Predibase deployment for inference, use the pb://deployments/... path, as opposed to the hf://... path when fine-tuning or creating a new deployment.

llm_deployment = pc.LLM("pb://deployments/mistral-7b")

Serverless Endpoints

Deployment NameParametersArchitectureLicenseContext Window (Max Tokens)Always On
mistral-7b7 billionMistralApache 2.08000Yes
mistral-7b-instruct7 billionMistralApache 2.08000Yes
mistral-7b-instruct-v0-27 billionMistralApache 2.08000Yes
mixtral-8x7b-instruct-v0-146.7 billionMixtralApache 2.032768Yes
zephyr-7b-beta7 billionMistralMIT8000Yes
llama-2-7b7 billionLlama-2Meta (request for commercial use)4096Yes
llama-2-7b-chat7 billionLlama-2Meta (request for commercial use)4096Yes
llama-2-13b13 billionLlama-2Meta (request for commercial use)4096No
llama-2-13b-chat13 billionLlama-2Meta (request for commercial use)4096No
llama-2-70b70 billionLlama-2Meta (request for commercial use)4096No
llama-2-70b-chat70 billionLlama-2Meta (request for commercial use)4096No
codellama-13b-instruct13 billionLlama-2Meta (request for commercial use)4096Yes
codellama-70b-instruct70 billionLlama-2Meta (request for commercial use)4096No
gemma-2b2 billionGemmaGoogle8192No
gemma-2b-instruct2 billionGemmaGoogle8192No
gemma-7b7 billionGemmaGoogle8192No
gemma-7b-instruct7 billionGemmaGoogle8192No
phi-22.7 billionPhiMIT2048No
yarn-mistral-7b-128k7 billionMistralApache 2.0128000No

Note: Models that are not always on scale down to 0 and may have a brief spin up time before serving requests. If you would like us to add support for any serverless endpoints or make any existing endpoints always on, please get in touch on Discord.

Dedicated Deployments

While popular models can be prompted via serverless endpoints, Predibase also offers the ability to spin up deployments on dedicated hardware for nearly any open-source model available. These models fall into two categories:

  1. Available LLMs: These are models we have first-class support for. These have been verified and are ensured to work well.
  2. Best-Effort LLMs: These are models that have not been verified and may occasionally not deploy as expected.

Available LLMs

NameParametersArchitectureLicenseContext Window (Max Tokens)
mistral-7b7 billionMistralApache 2.08000
mistral-7b-instruct-v0.17 billionMistralApache 2.08000
mistral-7b-instruct-v0.27 billionMistralApache 2.08000
Mixtral-8x7B-v0.146.7 billionMixtralApache 2.032768
Mixtral-8x7B-Instruct-v0.146.7 billionMixtralApache 2.032768
Mixtral-8x7B-Instruct-v0.1-AWQ46.7 billionMixtralApache 2.032768
zephyr-7b-beta7 billionMistralMIT8000
llama-2-7b7 billionLlama-2Meta (request for commercial use)4096
llama-2-7b-chat7 billionLlama-2Meta (request for commercial use)4096
llama-2-13b13 billionLlama-2Meta (request for commercial use)4096
llama-2-13b-chat13 billionLlama-2Meta (request for commercial use)4096
llama-2-70b70 billionLlama-2Meta (request for commercial use)4096
llama-2-70b-chat70 billionLlama-2Meta (request for commercial use)4096
Yarn-Mistral-7b-128k7 billionMistralApache 2.0128000
Yarn-Mistral-7B-128k-AWQ7 billionMistralApache 2.0128000
codellama-7b-instruct7 billionLlama-2Meta (request for commercial use)4096
codellama-13b-instruct13 billionLlama-2Meta (request for commercial use)4096
codellama-34b-instruct34 billionLlama-2Meta (request for commercial use)4096
codellama-70b-instruct70 billionLlama-2Meta (request for commercial use)4096
gemma-2b2 billionGemmaGoogle8192
gemma-2b-instruct2 billionGemmaGoogle8192
gemma-7b7 billionGemmaGoogle8192
gemma-7b-instruct7 billionGemmaGoogle8192
gpt2124 millionGPTMIT1024
gpt2-medium355 millionGPTMIT1024
gpt2-large774 millionGPTMIT1024
gpt2-xl1.5 billionGPTMIT1024
phi-22.7 billionPhiMIT2048

Best-effort LLMs

Predibase provides best-effort support for any Huggingface LLM meeting the following criteria:

To deploy LLMs with quantization, the quantization method must be supported in LoRAX. Example here. Note that at the moment, we do not support fine-tuning any post-quantized models.

Predibase Deployment URI

To specify a serverless LLM or one of your dedicated LLMs, you need to pass in the deployment URI, like so:

llm_deployment = pc.LLM("pb://deployments/{deployment-name}")

For a dedicated deployment, the deployment-name is the name you provided when doing llm.deploy.