Predibase supports two categories of language models for deployment:

  1. Officially Supported LLMs - These are models we have first-class support, meaning they have been verified and are ensured to work well. They are also available as Shared Endpoints for SaaS customers.
  2. Custom Base Models - Predibase offers best-effort support for deploying custom base models from Huggingface.

Quick Start

First, install the Predibase Python SDK:

pip install -U predibase

Using Shared Endpoints

Get started quickly with our pre-deployed shared endpoints:

from predibase import Predibase

# Initialize the client
pb = Predibase(api_token="<PREDIBASE_API_TOKEN>")

# Use the Mistral shared endpoint - no deployment needed
client = pb.deployments.client("llama-3-2-3b-instruct")
response = client.generate("What is machine learning?")
print(response.generated_text)

Creating Private Deployments

For production use cases, create your own dedicated deployment:

from predibase import Predibase, DeploymentConfig

pb = Predibase(api_token="<PREDIBASE_API_TOKEN>")

# Create a production deployment with Llama 3
deployment = pb.deployments.create(
    name="my-llama-3-2-3b-instruct",
    config=DeploymentConfig(
        base_model="llama-3-2-3b-instruct",
        min_replicas=0,  # Scale down to 0 replicas so you don't get charged when there are no requests
        max_replicas=1,
        accelerator="l40s_48gb_100"  # Specify hardware requirements
    )
)

# Use your private deployment
client = pb.deployments.client("my-llama-3-2-3b-instruct")
response = client.generate("Explain the theory of relativity.")
print(response.generated_text)

Officially Supported Models

These models are fully tested, optimized, and supported by Predibase:

DeepSeek Models

Model NameParametersContextArchitectureLicenseGPUAlways On Shared Endpoint
deepseek-r1-distill-qwen-32b32.8B8KQwenMITA100

Additional DeepSeek R1 and V3 models (original or distilled) are available upon request. Contact sales@predibase.com for deploying other DeepSeek models.

Mistral & Mixtral Models

Model NameParametersContextArchitectureLicenseGPUAlways On Shared Endpoint
mistral-7b-instruct-v0-27B32KMistralApache 2.0A100
mixtral-8x7b-instruct-v0-147B8KMistral MoEApache 2.0A100

Llama 3 Models

Model NameParametersContextArchitectureLicenseGPUAlways On Shared Endpoint
llama-3-3-70b-instruct70B32KLlama-3MetaA100
llama-3-2-1b1B64KLlama-3MetaA100
llama-3-2-1b-instruct1B64KLlama-3MetaA100
llama-3-2-3b3B32KLlama-3MetaA100
llama-3-2-3b-instruct3B32KLlama-3MetaA100
llama-3-1-8b8B64KLlama-3MetaA100
llama-3-1-8b-instruct8B64KLlama-3MetaA100
llama-3-8b8B8KLlama-3MetaA10G+
llama-3-8b-instruct8B8KLlama-3MetaA10G+
llama-3-70b70B8KLlama-3MetaA100
llama-3-70b-instruct70B8KLlama-3MetaA100

Llama 2 Models

Model NameParametersContextArchitectureLicenseGPUAlways On Shared Endpoint
llama-2-7b7B4KLlama-2MetaA10G+
llama-2-7b-chat7B4KLlama-2MetaA10G+
llama-2-13b13B4KLlama-2MetaA100
llama-2-13b-chat13B4KLlama-2MetaA100
llama-2-70b70B4KLlama-2MetaA100
llama-2-70b-chat70B4KLlama-2MetaA100

Code Llama Models

Model NameParametersContextArchitectureLicenseGPUAlways On Shared Endpoint
codellama-7b7B4KLlama-2MetaA10G+
codellama-7b-instruct7B4KLlama-2MetaA10G+
codellama-13b-instruct13B4KLlama-2MetaA100
codellama-70b-instruct70B4KLlama-2MetaA100

Qwen Models

Model NameParametersContextArchitectureLicenseGPUAlways On Shared Endpoint
qwen3-8b8.19B64KQwenTongyi QianwenA100
qwen3-14b14.8B16KQwenTongyi QianwenA100
qwen3-32b32.8B16KQwenTongyi QianwenA100
qwen3-30b-a3b30.5B16KQwenTongyi QianwenA100
qwen2-5-coder-3b-instruct3.09B32KQwenTongyi QianwenA100
qwen2-5-coder-7b-instruct7.62B32KQwenTongyi QianwenA100
qwen2-5-coder-32b-instruct32.8B16KQwenTongyi QianwenA100
qwen2-5-1-5b1.5B64KQwenTongyi QianwenA100
qwen2-5-1-5b-instruct1.5B64KQwenTongyi QianwenA100
qwen2-5-7b7B32KQwenTongyi QianwenA100
qwen2-5-7b-instruct7B32KQwenTongyi QianwenA100
qwen2-5-14b14B32KQwenTongyi QianwenA100
qwen2-5-14b-instruct14B32KQwenTongyi QianwenA100
qwen2-5-32b32B16KQwenTongyi QianwenA100
qwen2-5-32b-instruct32B16KQwenTongyi QianwenA100
qwen2-72b72.7B32KQwenTongyi QianwenA100
qwen2-72b-instruct72.7B32KQwenTongyi QianwenA100

Solar Models

Model NameParametersContextArchitectureLicenseGPUAlways On Shared Endpoint
solar-1-mini-chat-24061210.7B32KLlamaCustom LicenseA100
solar-pro-preview-instruct-v222.1B4KSolarCustom LicenseA100
solar-pro-24112622.1B32KSolarCustom LicenseA100
solar-pro-preview-instruct (deprecated)22.1B4KSolarCustom LicenseA100

Gemma Models

Model NameParametersContextArchitectureLicenseGPUAlways On Shared Endpoint
gemma-2b2.5B8KGemmaGoogleA10G+
gemma-2b-instruct2.5B8KGemmaGoogleA10G+
gemma-7b8.5B8KGemmaGoogleA100
gemma-7b-instruct8.5B8KGemmaGoogleA100
gemma-2-9b9.24B8KGemmaGoogleA100
gemma-2-9b-instruct9.24B8KGemmaGoogleA100
gemma-2-27b27.2B8KGemmaGoogleA100
gemma-2-27b-instruct27.2B8KGemmaGoogleA100

Other Models

Model NameParametersContextArchitectureLicenseGPUAlways On Shared Endpoint
zephyr-7b-beta7B32KMistralMITA100
phi-22.7B2KPhi-2MITA10G+
phi-3-mini-4k-instruct3.8B4KPhi-3MITA10G+
phi-3-5-mini-instruct3.8B64KPhi-3MITA100
openhands-lm-32b-v0.132.8B16KQwenTongyi QianwenA100

For detailed information about how to properly prompt each model, see our Chat Templates guide.

Custom Base Models

Predibase allows you to deploy custom (public or private) base models from Hugging Face.

Model Requirements

Before deploying a custom model, verify these requirements:

  1. Architecture Compatibility

    • Uses one of the supported vLLM architectures
    • Has the “Text Generation” and “Transformer” tags
    • Does not have a “custom_code” tag
  2. Format Requirements

    • Complete model weights
    • Proper configuration files
    • Compatible tokenizer implementation
    • Correct metadata and tags
    • Clear licensing for commercial use or private model

Deploying Custom Models

Deploy a custom model from Hugging Face Hub:

deployment = pb.deployments.create(
    name="my-custom-model",
    config=DeploymentConfig(
        base_model="BioMistral/BioMistral-7B",
        accelerator="l40s_48gb_100", # Required for custom models
        hf_token="<YOUR HUGGINGFACE TOKEN>", # Required for private Huggingface models
        min_replicas=0,
        max_replicas=1,
    )
)

# Create client for the custom model
client = pb.deployments.client("my-custom-model")

# Generate text
response = client.generate(
    "What are some major proteins found in the human body and what is their structure?",
    max_new_tokens=300,
    temperature=0.7
)

Next Steps