Language Models

Predibase supports two categories of language models for deployment:

Officially Supported LLMs - These are models we have first-class support, meaning they have been verified and are ensured to work well. They are also available as Shared Endpoints for SaaS customers.
Custom Base Models - Predibase offers best-effort support for deploying custom base models from Huggingface.

Quick Start

First, install the Predibase Python SDK:

pip install -U predibase

Using Shared Endpoints

Get started quickly with our pre-deployed shared endpoints:

from predibase import Predibase

# Initialize the client
pb = Predibase(api_token="<PREDIBASE_API_TOKEN>")

# Use the Mistral shared endpoint - no deployment needed
client = pb.deployments.client("llama-3-2-3b-instruct")
response = client.generate("What is machine learning?")
print(response.generated_text)

Creating Private Deployments

For production use cases, create your own dedicated deployment:

from predibase import Predibase, DeploymentConfig

pb = Predibase(api_token="<PREDIBASE_API_TOKEN>")

# Create a production deployment with Llama 3
deployment = pb.deployments.create(
    name="my-llama-3-2-3b-instruct",
    config=DeploymentConfig(
        base_model="llama-3-2-3b-instruct",
        min_replicas=0,  # Scale down to 0 replicas so you don't get charged when there are no requests
        max_replicas=1,
        accelerator="l40s_48gb_100"  # Specify hardware requirements
    )
)

# Use your private deployment
client = pb.deployments.client("my-llama-3-2-3b-instruct")
response = client.generate("Explain the theory of relativity.")
print(response.generated_text)

Officially Supported Models

These models are fully tested, optimized, and supported by Predibase:

DeepSeek Models

Model Name	Parameters	Context	Architecture	License	GPU	Always On Shared Endpoint
deepseek-r1-distill-qwen-32b	32.8B	8K	Qwen	MIT	A100	❌

Additional DeepSeek R1 and V3 models (original or distilled) are available upon request. Contact sales@predibase.com for deploying other DeepSeek models.

Mistral & Mixtral Models

Model Name	Parameters	Context	Architecture	License	GPU	Always On Shared Endpoint
mistral-7b-instruct-v0-2	7B	32K	Mistral	Apache 2.0	A100	❌
mixtral-8x7b-instruct-v0-1	47B	8K	Mistral MoE	Apache 2.0	A100	❌

Llama 3 Models

Model Name	Parameters	Context	Architecture	License	GPU	Always On Shared Endpoint
llama-3-3-70b-instruct	70B	32K	Llama-3	Meta	A100	❌
llama-3-2-1b	1B	64K	Llama-3	Meta	A100	❌
llama-3-2-1b-instruct	1B	64K	Llama-3	Meta	A100	❌
llama-3-2-3b	3B	32K	Llama-3	Meta	A100	❌
llama-3-2-3b-instruct	3B	32K	Llama-3	Meta	A100	❌
llama-3-1-8b	8B	64K	Llama-3	Meta	A100	❌
llama-3-1-8b-instruct	8B	64K	Llama-3	Meta	A100	✅
llama-3-8b	8B	8K	Llama-3	Meta	A10G+	❌
llama-3-8b-instruct	8B	8K	Llama-3	Meta	A10G+	❌
llama-3-70b	70B	8K	Llama-3	Meta	A100	❌
llama-3-70b-instruct	70B	8K	Llama-3	Meta	A100	❌

Llama 2 Models

Model Name	Parameters	Context	Architecture	License	GPU	Always On Shared Endpoint
llama-2-7b	7B	4K	Llama-2	Meta	A10G+	❌
llama-2-7b-chat	7B	4K	Llama-2	Meta	A10G+	❌
llama-2-13b	13B	4K	Llama-2	Meta	A100	❌
llama-2-13b-chat	13B	4K	Llama-2	Meta	A100	❌
llama-2-70b	70B	4K	Llama-2	Meta	A100	❌
llama-2-70b-chat	70B	4K	Llama-2	Meta	A100	❌

Code Llama Models

Model Name	Parameters	Context	Architecture	License	GPU	Always On Shared Endpoint
codellama-7b	7B	4K	Llama-2	Meta	A10G+	❌
codellama-7b-instruct	7B	4K	Llama-2	Meta	A10G+	❌
codellama-13b-instruct	13B	4K	Llama-2	Meta	A100	❌
codellama-70b-instruct	70B	4K	Llama-2	Meta	A100	❌

Qwen Models

Model Name	Parameters	Context	Architecture	License	GPU	Always On Shared Endpoint
qwen3-8b	8.19B	64K	Qwen	Tongyi Qianwen	A100	✅
qwen3-14b	14.8B	16K	Qwen	Tongyi Qianwen	A100	❌
qwen3-32b	32.8B	16K	Qwen	Tongyi Qianwen	A100	✅
qwen3-30b-a3b	30.5B	16K	Qwen	Tongyi Qianwen	A100	❌
qwen2-5-coder-3b-instruct	3.09B	32K	Qwen	Tongyi Qianwen	A100	❌
qwen2-5-coder-7b-instruct	7.62B	32K	Qwen	Tongyi Qianwen	A100	❌
qwen2-5-coder-32b-instruct	32.8B	16K	Qwen	Tongyi Qianwen	A100	❌
qwen2-5-1-5b	1.5B	64K	Qwen	Tongyi Qianwen	A100	❌
qwen2-5-1-5b-instruct	1.5B	64K	Qwen	Tongyi Qianwen	A100	❌
qwen2-5-7b	7B	32K	Qwen	Tongyi Qianwen	A100	❌
qwen2-5-7b-instruct	7B	32K	Qwen	Tongyi Qianwen	A100	❌
qwen2-5-14b	14B	32K	Qwen	Tongyi Qianwen	A100	❌
qwen2-5-14b-instruct	14B	32K	Qwen	Tongyi Qianwen	A100	❌
qwen2-5-32b	32B	16K	Qwen	Tongyi Qianwen	A100	❌
qwen2-5-32b-instruct	32B	16K	Qwen	Tongyi Qianwen	A100	❌
qwen2-72b	72.7B	32K	Qwen	Tongyi Qianwen	A100	❌
qwen2-72b-instruct	72.7B	32K	Qwen	Tongyi Qianwen	A100	❌

Solar Models

Model Name	Parameters	Context	Architecture	License	GPU	Always On Shared Endpoint
solar-1-mini-chat-240612	10.7B	32K	Llama	Custom License	A100	❌
solar-pro-preview-instruct-v2	22.1B	4K	Solar	Custom License	A100	❌
solar-pro-241126	22.1B	32K	Solar	Custom License	A100	❌
solar-pro-preview-instruct (deprecated)	22.1B	4K	Solar	Custom License	A100	❌

Gemma Models

Model Name	Parameters	Context	Architecture	License	GPU	Always On Shared Endpoint
gemma-2b	2.5B	8K	Gemma	Google	A10G+	❌
gemma-2b-instruct	2.5B	8K	Gemma	Google	A10G+	❌
gemma-7b	8.5B	8K	Gemma	Google	A100	❌
gemma-7b-instruct	8.5B	8K	Gemma	Google	A100	❌
gemma-2-9b	9.24B	8K	Gemma	Google	A100	❌
gemma-2-9b-instruct	9.24B	8K	Gemma	Google	A100	❌
gemma-2-27b	27.2B	8K	Gemma	Google	A100	❌
gemma-2-27b-instruct	27.2B	8K	Gemma	Google	A100	❌

Other Models

Model Name	Parameters	Context	Architecture	License	GPU	Always On Shared Endpoint
zephyr-7b-beta	7B	32K	Mistral	MIT	A100	❌
phi-2	2.7B	2K	Phi-2	MIT	A10G+	❌
phi-3-mini-4k-instruct	3.8B	4K	Phi-3	MIT	A10G+	❌
phi-3-5-mini-instruct	3.8B	64K	Phi-3	MIT	A100	❌
openhands-lm-32b-v0.1	32.8B	16K	Qwen	Tongyi Qianwen	A100	❌

For detailed information about how to properly prompt each model, see our Chat Templates guide.

Custom Base Models

Predibase allows you to deploy custom (public or private) base models from Hugging Face.

Model Requirements

Before deploying a custom model, verify these requirements:

Architecture Compatibility
- Uses one of the supported vLLM architectures
- Has the “Text Generation” and “Transformer” tags
- Does not have a “custom_code” tag
Format Requirements
- Complete model weights
- Proper configuration files
- Compatible tokenizer implementation
- Correct metadata and tags
- Clear licensing for commercial use or private model

Deploying Custom Models

Deploy a custom model from Hugging Face Hub:

deployment = pb.deployments.create(
    name="my-custom-model",
    config=DeploymentConfig(
        base_model="BioMistral/BioMistral-7B",
        accelerator="l40s_48gb_100", # Required for custom models
        hf_token="<YOUR HUGGINGFACE TOKEN>", # Required for private Huggingface models
        min_replicas=0,
        max_replicas=1,
    )
)

# Create client for the custom model
client = pb.deployments.client("my-custom-model")

# Generate text
response = client.generate(
    "What are some major proteins found in the human body and what is their structure?",
    max_new_tokens=300,
    temperature=0.7
)

Next Steps

View chat templates for each model
Try shared endpoints for quick testing
Set up private deployments for production
Use Fine-tuned Models to customize models
Explore vision models for image tasks

Getting Started

Inference

Fine-Tuning

Account

Integrations

Examples

Resources

Quick Start

Using Shared Endpoints

Creating Private Deployments

Officially Supported Models

DeepSeek Models

Mistral & Mixtral Models

Llama 3 Models

Llama 2 Models

Code Llama Models

Qwen Models

Solar Models

Gemma Models

Other Models

Custom Base Models

Model Requirements

Deploying Custom Models

Next Steps

Getting Started

Inference

Fine-Tuning

Account

Integrations

Examples

Resources

​Quick Start

​Using Shared Endpoints

​Creating Private Deployments

​Officially Supported Models

​DeepSeek Models

​Mistral & Mixtral Models

​Llama 3 Models

​Llama 2 Models

​Code Llama Models

​Qwen Models

​Solar Models

​Gemma Models

​Other Models

​Custom Base Models

​Model Requirements

​Deploying Custom Models

​Next Steps

Quick Start

Using Shared Endpoints

Creating Private Deployments

Officially Supported Models

DeepSeek Models

Mistral & Mixtral Models

Llama 3 Models

Llama 2 Models

Code Llama Models

Qwen Models

Solar Models

Gemma Models

Other Models

Custom Base Models

Model Requirements

Deploying Custom Models

Next Steps