Documentation Index
Fetch the complete documentation index at: https://docs.predibase.com/llms.txt
Use this file to discover all available pages before exploring further.
Well-supported LLMs
The following models are currently well-supported for fine-tuning.
Large Language Models (Text)
You may fine-tune LoRA,
Turbo LoRA, and
Turbo adapters on any of these base LLMs, but
note that for Turbo LoRA and Turbo adapters, some models may require
additional deployment configurations.
Qwen Models
| Model Name | Parameters | Context Window | Supported Fine-Tuning Context Window | Adapter Pre-load Not Required | Supports GRPO | Architecture | License |
|---|
| qwen3-30b-a3b | 30.5 billion | 131072 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
| qwen3-32b | 32.8 billion | 131072 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
| qwen3-14b | 14.8 billion | 131072 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
| qwen3-8b | 8.19 billion | 131072 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
| qwen2-5-32b | 32.8 billion | 131072 | 16384 | ❌ | ✅ | Qwen | Tongyi Qianwen |
| qwen2-5-32b-instruct | 32.8 billion | 32768 | 16384 | ❌ | ✅ | Qwen | Tongyi Qianwen |
| qwen2-5-14b | 14.8 billion | 131072 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
| qwen2-5-14b-instruct | 14.8 billion | 32768 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
| qwen2-5-7b | 7.62 billion | 131072 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
| qwen2-5-7b-instruct | 7.62 billion | 32768 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
| qwen2-5-3b | 3.09 billion | 131072 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
| qwen2-5-3b-instruct | 3.09 billion | 32768 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
| qwen2-5-1-5b | 1.54 billion | 131072 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
| qwen2-5-1-5b-instruct | 1.54 billion | 32768 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
| qwen2-5-coder-32b-instruct | 32.8 billion | 131072 | 16384 | ❌ | ✅ | Qwen | Tongyi Qianwen |
| qwen2-5-coder-7b-instruct | 7.62 billion | 131072 | 16384 | ❌ | ✅ | Qwen | Tongyi Qianwen |
| qwen2-5-coder-3b-instruct | 3.09 billion | 131072 | 16384 | ❌ | ✅ | Qwen | Tongyi Qianwen |
| qwen2-7b | 7.62 billion | 131072 | 32768 | ❌ | ❌ | Qwen | Tongyi Qianwen |
| qwen2-1-5b-instruct | 1.54 billion | 131072 | 32768 | ❌ | ❌ | Qwen | Tongyi Qianwen |
| qwen2-1-5b | 1.54 billion | 131072 | 32768 | ❌ | ❌ | Qwen | Tongyi Qianwen |
Llama 3 Models
| Model Name | Parameters | Context Window | Supported Fine-Tuning Context Window | Adapter Pre-load Not Required | Supports GRPO | Architecture | License |
|---|
| llama-3-3-70b | 70.6 billion | 131072 | 8192 | ❌ | ❌ | Llama-3 | Meta (request for commercial use) |
| llama-3-2-3b | 3.21 billion | 32768 | 32768 | ❌ | ✅ | Llama-3 | Meta (request for commercial use) |
| llama-3-2-3b-instruct | 3.21 billion | 32768 | 32768 | ❌ | ✅ | Llama-3 | Meta (request for commercial use) |
| llama-3-2-1b | 1.24 billion | 32768 | 32768 | ❌ | ✅ | Llama-3 | Meta (request for commercial use) |
| llama-3-2-1b-instruct | 1.24 billion | 32768 | 32768 | ❌ | ✅ | Llama-3 | Meta (request for commercial use) |
| llama-3-1-8b | 8 billion | 62999 | 32768 | ❌ | ✅ | Llama-3 | Meta (request for commercial use) |
| llama-3-1-8b-instruct | 8 billion | 62999 | 32768 | ✅ | ✅ | Llama-3 | Meta (request for commercial use) |
| llama-3-70b | 70 billion | 8192 | 8192 | ❌ | ❌ | Llama-3 | Meta (request for commercial use) |
| llama-3-70b-instruct | 70 billion | 8192 | 8192 | ❌ | ❌ | Llama-3 | Meta (request for commercial use) |
| llama-3-8b | 8 billion | 8192 | 8192 | ❌ | ❌ | Llama-3 | Meta (request for commercial use) |
| llama-3-8b-instruct | 8 billion | 8192 | 8192 | ❌ | ❌ | Llama-3 | Meta (request for commercial use) |
Llama 2 Models
| Model Name | Parameters | Context Window | Supported Fine-Tuning Context Window | Adapter Pre-load Not Required | Supports GRPO | Architecture | License |
|---|
| llama-2-70b | 70 billion | 4096 | 4096 | ❌ | ❌ | Llama-2 | Meta (request for commercial use) |
| llama-2-70b-chat | 70 billion | 4096 | 4096 | ❌ | ❌ | Llama-2 | Meta (request for commercial use) |
| llama-2-13b | 13 billion | 4096 | 4096 | ❌ | ❌ | Llama-2 | Meta (request for commercial use) |
| llama-2-13b-chat | 13 billion | 4096 | 4096 | ❌ | ❌ | Llama-2 | Meta (request for commercial use) |
| llama-2-7b | 7 billion | 4096 | 4096 | ❌ | ❌ | Llama-2 | Meta (request for commercial use) |
| llama-2-7b-chat | 7 billion | 4096 | 4096 | ❌ | ❌ | Llama-2 | Meta (request for commercial use) |
CodeLlama Models
| Model Name | Parameters | Context Window | Supported Fine-Tuning Context Window | Adapter Pre-load Not Required | Supports GRPO | Architecture | License |
|---|
| codellama-70b-instruct | 70 billion | 4096 | 4096 | ❌ | ❌ | Llama-2 | Meta (request for commercial use) |
| codellama-13b-instruct | 13 billion | 16384 | 16384 | ❌ | ❌ | Llama-2 | Meta (request for commercial use) |
| codellama-7b | 7 billion | 4096 | 4096 | ❌ | ❌ | Llama-2 | Meta (request for commercial use) |
| codellama-7b-instruct | 7 billion | 4096 | 4096 | ❌ | ❌ | Llama-2 | Meta (request for commercial use) |
Mistral Models
| Model Name | Parameters | Context Window | Supported Fine-Tuning Context Window | Adapter Pre-load Not Required | Supports GRPO | Architecture | License |
|---|
| mistral-7b-instruct-v0-3 | 7 billion | 32768 | 32768 | ✅ | ❌ | Mistral | Apache 2.0 |
| mistral-7b-instruct-v0-2 | 7 billion | 32768 | 32768 | ✅ | ❌ | Mistral | Apache 2.0 |
| mistral-7b | 7 billion | 32768 | 32768 | ✅ | ❌ | Mistral | Apache 2.0 |
| mistral-7b-instruct | 7 billion | 32768 | 32768 | ✅ | ❌ | Mistral | Apache 2.0 |
| mistral-nemo-12b-2407 | 12 billion | 131072 | 32768 | ❌ | ❌ | Mistral | Apache 2.0 |
| mistral-nemo-12b-instruct-2407 | 12 billion | 131072 | 32768 | ❌ | ❌ | Mistral | Apache 2.0 |
| zephyr-7b-beta | 7 billion | 32768 | 32768 | ✅ | ❌ | Mistral | MIT |
| mixtral-8x7b-instruct-v0-1 | 46.7 billion | 32768 | 7168 | ❌ | ❌ | Mixtral | Apache 2.0 |
Solar Models
| Model Name | Parameters | Context Window | Supported Fine-Tuning Context Window | Adapter Pre-load Not Required | Supports GRPO | Architecture | License |
|---|
| solar-1-mini-chat-240612 | 10.7 billion | 32768 | 32768 | ✅ | ❌ | Llama | Custom License |
| solar-pro-preview-instruct-v2 | 22.1 billion | 4096 | 4096 | ❌ | ❌ | Solar | Custom License |
| solar-pro-241126 | 22.1 billion | 32768 | 16384 | ❌ | ❌ | Solar | Custom License |
Gemma Models
| Model Name | Parameters | Context Window | Supported Fine-Tuning Context Window | Adapter Pre-load Not Required | Supports GRPO | Architecture | License |
|---|
| gemma-2-27b | 27.2 billion | 8192 | 4096 | ❌ | ❌ | Gemma | Google |
| gemma-2-27b-instruct | 27.2 billion | 8192 | 4096 | ❌ | ❌ | Gemma | Google |
| gemma-2-9b | 9.24 billion | 8192 | 8192 | ❌ | ❌ | Gemma | Google |
| gemma-2-9b-instruct | 9.24 billion | 8192 | 8192 | ❌ | ❌ | Gemma | Google |
| gemma-7b | 8.5 billion | 8192 | 8192 | ❌ | ❌ | Gemma | Google |
| gemma-7b-instruct | 8.5 billion | 8192 | 8192 | ❌ | ❌ | Gemma | Google |
| gemma-2b | 2.5 billion | 8192 | 8192 | ❌ | ❌ | Gemma | Google |
| gemma-2b-instruct | 2.5 billion | 8192 | 8192 | ❌ | ❌ | Gemma | Google |
Phi Models
| Model Name | Parameters | Context Window | Supported Fine-Tuning Context Window | Adapter Pre-load Not Required | Supports GRPO | Architecture | License |
|---|
| phi-3-5-mini-instruct | 3.8 billion | 131072 | 16384 | ❌ | ✅ | Phi-3 | Microsoft |
| phi-3-mini-4k-instruct | 3.8 billion | 4096 | 4096 | Turbo LoRA not supported | ✅ | Phi-3 | Microsoft |
| phi-2 | 2.7 billion | 2048 | 2048 | Turbo not supported | ✅ | Phi-2 | Microsoft |
Other Models
| Model Name | Parameters | Context Window | Supported Fine-Tuning Context Window | Adapter Pre-load Not Required | Supports GRPO | Architecture | License |
|---|
| deepseek-r1-distill-qwen-32b | 32.8 billion | 131072 | 8000 | ❌ | ✅ | Qwen | DeepSeek-AI |
| openhands-lm-32b-v0.1 | 32.8 billion | 131072 | 16384 | ❌ | ✅ | Qwen | Xingyao Wang |
Many of the latest OSS models are released in two variants:
- Base model (llama-2-7b, etc): These are models that are primarily trained
on the objective of text completion.
- Instruction-Tuned (llama-2-7b-chat, mistral-7b-instruct, etc): These are
models that have been further trained on (instruction, output) pairs in
order to better respond to human instruction-styled inputs. The instructions
effectively constrains the model’s output to align with the response
characteristics or domain knowledge.
Vision Language Models
Currently, we support fine-tuning LoRAs for the following vision language models:
Qwen VL
| Model Name | Parameters | Architecture | License | LoRA | Turbo LoRA | Turbo | Context Window | Supported Fine-Tuning Context Window | Supports GRPO |
|---|
| qwen2-vl-7b-instruct | 7B | Qwen2 | Tongyi Qianwen | ✅ | ❌ | ❌ | 128000 | 32768 | ❌ |
| qwen2-5-vl-3b-instruct | 3B | Qwen2.5 | Tongyi Qianwen | ✅ | ❌ | ❌ | 128000 | 32768 | ❌ |
| qwen2-5-vl-7b-instruct | 7B | Qwen2.5 | Tongyi Qianwen | ✅ | ❌ | ❌ | 128000 | 32768 | ❌ |
To get started with VLM fine-tuning, check out
this user-guide here on how to format
your data for training and test inference.
Best-Effort LLMs (via HuggingFace)
Best-effort support for fine-tuning is also offered for any Huggingface LLM meeting the
following criteria:
- Has the “Text Generation” and “Transformer” tags
- Does not have a “custom_code” tag
- Are not post-quantized (ex. model containing a quantization method such as
“AWQ” in the name)
- Has text inputs and outputs
Fine-tuning a custom LLM
- Get the Huggingface ID for your model by clicking the copy icon on the base model’s Huggingface page, ex.
BioMistral/BioMistral-7B.
- Pass the Huggingface ID as the
base_model.
from predibase import Predibase, SFTConfig
pb = Predibase(api_token=<API_TOKEN>)
# Create an adapter repository
repo = pb.repos.create(name="bio-summarizer", description="Bio News Summarizer", exists_ok=True)
# Start a fine-tuning job, blocks until training is finished
adapter = pb.adapters.create(
config=SFTConfig(
base_model="BioMistral/BioMistral-7B"
),
dataset="bio-dataset",
repo=repo,
description="initial model with defaults"
)
Predibase training metrics will be automatically streamed to stdout. To view
additional metrics via Tensorboard, pass show_tensorboard=True to the create
call:
from predibase import SFTConfig
adapter = pb.adapters.create(
config=SFTConfig(
base_model="BioMistral/BioMistral-7B"
),
dataset="bio-dataset",
repo=repo,
description="initial model with defaults",
show_tensorboard=True
)
Note that tensorboard data may take some time to refresh. Predibase also supports
integrations with Weights & Biases and
Comet.
Note that if you fine-tune a custom model not on our shared deployments
list, you’ll need to deploy the custom base
model as a private serverless deployment in
order to run inference on your newly trained adapter. This is also supported on
a best-effort basis.