Documentation Index
Fetch the complete documentation index at: https://docs.predibase.com/llms.txt
Use this file to discover all available pages before exploring further.
Predibase supports two categories of language models for deployment:
- Officially Supported LLMs - These are models we have first-class support, meaning they have been verified and are ensured to work well. They are also available as Shared Endpoints for SaaS customers.
- Custom Base Models - Predibase offers best-effort support for deploying custom base models from Huggingface.
Quick Start
First, install the Predibase Python SDK:
Using Shared Endpoints
Get started quickly with our pre-deployed shared endpoints:
from predibase import Predibase
# Initialize the client
pb = Predibase(api_token="<PREDIBASE_API_TOKEN>")
# Use the Mistral shared endpoint - no deployment needed
client = pb.deployments.client("llama-3-2-3b-instruct")
response = client.generate("What is machine learning?")
print(response.generated_text)
Creating Private Deployments
For production use cases, create your own dedicated deployment:
from predibase import Predibase, DeploymentConfig
pb = Predibase(api_token="<PREDIBASE_API_TOKEN>")
# Create a production deployment with Llama 3
deployment = pb.deployments.create(
name="my-llama-3-2-3b-instruct",
config=DeploymentConfig(
base_model="llama-3-2-3b-instruct",
min_replicas=0, # Scale down to 0 replicas so you don't get charged when there are no requests
max_replicas=1,
accelerator="l40s_48gb_100" # Specify hardware requirements
)
)
# Use your private deployment
client = pb.deployments.client("my-llama-3-2-3b-instruct")
response = client.generate("Explain the theory of relativity.")
print(response.generated_text)
Officially Supported Models
These models are fully tested, optimized, and supported by Predibase:
DeepSeek Models
| Model Name | Parameters | Context | Architecture | License | GPU | Always On Shared Endpoint |
|---|
| deepseek-r1-distill-qwen-32b | 32.8B | 8K | Qwen | MIT | A100 | ❌ |
Additional DeepSeek R1 and V3 models (original or distilled) are
available upon request. Contact
sales@predibase.com for deploying other DeepSeek models.
Mistral & Mixtral Models
| Model Name | Parameters | Context | Architecture | License | GPU | Always On Shared Endpoint |
|---|
| mistral-7b-instruct-v0-2 | 7B | 32K | Mistral | Apache 2.0 | A100 | ❌ |
| mixtral-8x7b-instruct-v0-1 | 47B | 8K | Mistral MoE | Apache 2.0 | A100 | ❌ |
Llama 3 Models
| Model Name | Parameters | Context | Architecture | License | GPU | Always On Shared Endpoint |
|---|
| llama-3-3-70b-instruct | 70B | 32K | Llama-3 | Meta | A100 | ❌ |
| llama-3-2-1b | 1B | 64K | Llama-3 | Meta | A100 | ❌ |
| llama-3-2-1b-instruct | 1B | 64K | Llama-3 | Meta | A100 | ❌ |
| llama-3-2-3b | 3B | 32K | Llama-3 | Meta | A100 | ❌ |
| llama-3-2-3b-instruct | 3B | 32K | Llama-3 | Meta | A100 | ❌ |
| llama-3-1-8b | 8B | 64K | Llama-3 | Meta | A100 | ❌ |
| llama-3-1-8b-instruct | 8B | 64K | Llama-3 | Meta | A100 | ✅ |
| llama-3-8b | 8B | 8K | Llama-3 | Meta | A10G+ | ❌ |
| llama-3-8b-instruct | 8B | 8K | Llama-3 | Meta | A10G+ | ❌ |
| llama-3-70b | 70B | 8K | Llama-3 | Meta | A100 | ❌ |
| llama-3-70b-instruct | 70B | 8K | Llama-3 | Meta | A100 | ❌ |
Llama 2 Models
| Model Name | Parameters | Context | Architecture | License | GPU | Always On Shared Endpoint |
|---|
| llama-2-7b | 7B | 4K | Llama-2 | Meta | A10G+ | ❌ |
| llama-2-7b-chat | 7B | 4K | Llama-2 | Meta | A10G+ | ❌ |
| llama-2-13b | 13B | 4K | Llama-2 | Meta | A100 | ❌ |
| llama-2-13b-chat | 13B | 4K | Llama-2 | Meta | A100 | ❌ |
| llama-2-70b | 70B | 4K | Llama-2 | Meta | A100 | ❌ |
| llama-2-70b-chat | 70B | 4K | Llama-2 | Meta | A100 | ❌ |
Code Llama Models
| Model Name | Parameters | Context | Architecture | License | GPU | Always On Shared Endpoint |
|---|
| codellama-7b | 7B | 4K | Llama-2 | Meta | A10G+ | ❌ |
| codellama-7b-instruct | 7B | 4K | Llama-2 | Meta | A10G+ | ❌ |
| codellama-13b-instruct | 13B | 4K | Llama-2 | Meta | A100 | ❌ |
| codellama-70b-instruct | 70B | 4K | Llama-2 | Meta | A100 | ❌ |
Qwen Models
| Model Name | Parameters | Context | Architecture | License | GPU | Always On Shared Endpoint |
|---|
| qwen3-8b | 8.19B | 64K | Qwen | Tongyi Qianwen | A100 | ✅ |
| qwen3-14b | 14.8B | 16K | Qwen | Tongyi Qianwen | A100 | ❌ |
| qwen3-32b | 32.8B | 16K | Qwen | Tongyi Qianwen | A100 | ✅ |
| qwen3-30b-a3b | 30.5B | 16K | Qwen | Tongyi Qianwen | A100 | ❌ |
| qwen2-5-coder-3b-instruct | 3.09B | 32K | Qwen | Tongyi Qianwen | A100 | ❌ |
| qwen2-5-coder-7b-instruct | 7.62B | 32K | Qwen | Tongyi Qianwen | A100 | ❌ |
| qwen2-5-coder-32b-instruct | 32.8B | 16K | Qwen | Tongyi Qianwen | A100 | ❌ |
| qwen2-5-1-5b | 1.5B | 64K | Qwen | Tongyi Qianwen | A100 | ❌ |
| qwen2-5-1-5b-instruct | 1.5B | 64K | Qwen | Tongyi Qianwen | A100 | ❌ |
| qwen2-5-7b | 7B | 32K | Qwen | Tongyi Qianwen | A100 | ❌ |
| qwen2-5-7b-instruct | 7B | 32K | Qwen | Tongyi Qianwen | A100 | ❌ |
| qwen2-5-14b | 14B | 32K | Qwen | Tongyi Qianwen | A100 | ❌ |
| qwen2-5-14b-instruct | 14B | 32K | Qwen | Tongyi Qianwen | A100 | ❌ |
| qwen2-5-32b | 32B | 16K | Qwen | Tongyi Qianwen | A100 | ❌ |
| qwen2-5-32b-instruct | 32B | 16K | Qwen | Tongyi Qianwen | A100 | ❌ |
| qwen2-72b | 72.7B | 32K | Qwen | Tongyi Qianwen | A100 | ❌ |
| qwen2-72b-instruct | 72.7B | 32K | Qwen | Tongyi Qianwen | A100 | ❌ |
Solar Models
| Model Name | Parameters | Context | Architecture | License | GPU | Always On Shared Endpoint |
|---|
| solar-1-mini-chat-240612 | 10.7B | 32K | Llama | Custom License | A100 | ❌ |
| solar-pro-preview-instruct-v2 | 22.1B | 4K | Solar | Custom License | A100 | ❌ |
| solar-pro-241126 | 22.1B | 32K | Solar | Custom License | A100 | ❌ |
| solar-pro-preview-instruct (deprecated) | 22.1B | 4K | Solar | Custom License | A100 | ❌ |
Gemma Models
| Model Name | Parameters | Context | Architecture | License | GPU | Always On Shared Endpoint |
|---|
| gemma-2b | 2.5B | 8K | Gemma | Google | A10G+ | ❌ |
| gemma-2b-instruct | 2.5B | 8K | Gemma | Google | A10G+ | ❌ |
| gemma-7b | 8.5B | 8K | Gemma | Google | A100 | ❌ |
| gemma-7b-instruct | 8.5B | 8K | Gemma | Google | A100 | ❌ |
| gemma-2-9b | 9.24B | 8K | Gemma | Google | A100 | ❌ |
| gemma-2-9b-instruct | 9.24B | 8K | Gemma | Google | A100 | ❌ |
| gemma-2-27b | 27.2B | 8K | Gemma | Google | A100 | ❌ |
| gemma-2-27b-instruct | 27.2B | 8K | Gemma | Google | A100 | ❌ |
Other Models
| Model Name | Parameters | Context | Architecture | License | GPU | Always On Shared Endpoint |
|---|
| zephyr-7b-beta | 7B | 32K | Mistral | MIT | A100 | ❌ |
| phi-2 | 2.7B | 2K | Phi-2 | MIT | A10G+ | ❌ |
| phi-3-mini-4k-instruct | 3.8B | 4K | Phi-3 | MIT | A10G+ | ❌ |
| phi-3-5-mini-instruct | 3.8B | 64K | Phi-3 | MIT | A100 | ❌ |
| openhands-lm-32b-v0.1 | 32.8B | 16K | Qwen | Tongyi Qianwen | A100 | ❌ |
For detailed information about how to properly prompt each model, see our
Chat Templates guide.
Custom Base Models
Predibase allows you to deploy custom (public or private) base models from Hugging Face.
Model Requirements
Before deploying a custom model, verify these requirements:
-
Architecture Compatibility
-
Format Requirements
- Complete model weights
- Proper configuration files
- Compatible tokenizer implementation
- Correct metadata and tags
- Clear licensing for commercial use or private model
Deploying Custom Models
Deploy a custom model from Hugging Face Hub:
deployment = pb.deployments.create(
name="my-custom-model",
config=DeploymentConfig(
base_model="BioMistral/BioMistral-7B",
accelerator="l40s_48gb_100", # Required for custom models
hf_token="<YOUR HUGGINGFACE TOKEN>", # Required for private Huggingface models
min_replicas=0,
max_replicas=1,
)
)
# Create client for the custom model
client = pb.deployments.client("my-custom-model")
# Generate text
response = client.generate(
"What are some major proteins found in the human body and what is their structure?",
max_new_tokens=300,
temperature=0.7
)
Next Steps