Well-supported LLMs
The following models are currently well-supported for fine-tuning.Large Language Models (Text)
You may fine-tune LoRA, Turbo LoRA, and Turbo adapters on any of these base LLMs, but note that for Turbo LoRA and Turbo adapters, some models may require additional deployment configurations.Qwen Models
Model Name | Parameters | Context Window | Supported Fine-Tuning Context Window | Adapter Pre-load Not Required | Supports GRPO | Architecture | License |
---|---|---|---|---|---|---|---|
qwen3-30b-a3b | 30.5 billion | 131072 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
qwen3-32b | 32.8 billion | 131072 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
qwen3-14b | 14.8 billion | 131072 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
qwen3-8b | 8.19 billion | 131072 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
qwen2-5-32b | 32.8 billion | 131072 | 16384 | ❌ | ✅ | Qwen | Tongyi Qianwen |
qwen2-5-32b-instruct | 32.8 billion | 32768 | 16384 | ❌ | ✅ | Qwen | Tongyi Qianwen |
qwen2-5-14b | 14.8 billion | 131072 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
qwen2-5-14b-instruct | 14.8 billion | 32768 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
qwen2-5-7b | 7.62 billion | 131072 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
qwen2-5-7b-instruct | 7.62 billion | 32768 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
qwen2-5-3b | 3.09 billion | 131072 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
qwen2-5-3b-instruct | 3.09 billion | 32768 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
qwen2-5-1-5b | 1.54 billion | 131072 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
qwen2-5-1-5b-instruct | 1.54 billion | 32768 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
qwen2-5-coder-32b-instruct | 32.8 billion | 131072 | 16384 | ❌ | ✅ | Qwen | Tongyi Qianwen |
qwen2-5-coder-7b-instruct | 7.62 billion | 131072 | 16384 | ❌ | ✅ | Qwen | Tongyi Qianwen |
qwen2-5-coder-3b-instruct | 3.09 billion | 131072 | 16384 | ❌ | ✅ | Qwen | Tongyi Qianwen |
qwen2-7b | 7.62 billion | 131072 | 32768 | ❌ | ❌ | Qwen | Tongyi Qianwen |
qwen2-1-5b-instruct | 1.54 billion | 131072 | 32768 | ❌ | ❌ | Qwen | Tongyi Qianwen |
qwen2-1-5b | 1.54 billion | 131072 | 32768 | ❌ | ❌ | Qwen | Tongyi Qianwen |
Llama 3 Models
Model Name | Parameters | Context Window | Supported Fine-Tuning Context Window | Adapter Pre-load Not Required | Supports GRPO | Architecture | License |
---|---|---|---|---|---|---|---|
llama-3-3-70b | 70.6 billion | 131072 | 8192 | ❌ | ❌ | Llama-3 | Meta (request for commercial use) |
llama-3-2-3b | 3.21 billion | 32768 | 32768 | ❌ | ✅ | Llama-3 | Meta (request for commercial use) |
llama-3-2-3b-instruct | 3.21 billion | 32768 | 32768 | ❌ | ✅ | Llama-3 | Meta (request for commercial use) |
llama-3-2-1b | 1.24 billion | 32768 | 32768 | ❌ | ✅ | Llama-3 | Meta (request for commercial use) |
llama-3-2-1b-instruct | 1.24 billion | 32768 | 32768 | ❌ | ✅ | Llama-3 | Meta (request for commercial use) |
llama-3-1-8b | 8 billion | 62999 | 32768 | ❌ | ✅ | Llama-3 | Meta (request for commercial use) |
llama-3-1-8b-instruct | 8 billion | 62999 | 32768 | ✅ | ✅ | Llama-3 | Meta (request for commercial use) |
llama-3-70b | 70 billion | 8192 | 8192 | ❌ | ❌ | Llama-3 | Meta (request for commercial use) |
llama-3-70b-instruct | 70 billion | 8192 | 8192 | ❌ | ❌ | Llama-3 | Meta (request for commercial use) |
llama-3-8b | 8 billion | 8192 | 8192 | ❌ | ❌ | Llama-3 | Meta (request for commercial use) |
llama-3-8b-instruct | 8 billion | 8192 | 8192 | ❌ | ❌ | Llama-3 | Meta (request for commercial use) |
Llama 2 Models
Model Name | Parameters | Context Window | Supported Fine-Tuning Context Window | Adapter Pre-load Not Required | Supports GRPO | Architecture | License |
---|---|---|---|---|---|---|---|
llama-2-70b | 70 billion | 4096 | 4096 | ❌ | ❌ | Llama-2 | Meta (request for commercial use) |
llama-2-70b-chat | 70 billion | 4096 | 4096 | ❌ | ❌ | Llama-2 | Meta (request for commercial use) |
llama-2-13b | 13 billion | 4096 | 4096 | ❌ | ❌ | Llama-2 | Meta (request for commercial use) |
llama-2-13b-chat | 13 billion | 4096 | 4096 | ❌ | ❌ | Llama-2 | Meta (request for commercial use) |
llama-2-7b | 7 billion | 4096 | 4096 | ❌ | ❌ | Llama-2 | Meta (request for commercial use) |
llama-2-7b-chat | 7 billion | 4096 | 4096 | ❌ | ❌ | Llama-2 | Meta (request for commercial use) |
CodeLlama Models
Model Name | Parameters | Context Window | Supported Fine-Tuning Context Window | Adapter Pre-load Not Required | Supports GRPO | Architecture | License |
---|---|---|---|---|---|---|---|
codellama-70b-instruct | 70 billion | 4096 | 4096 | ❌ | ❌ | Llama-2 | Meta (request for commercial use) |
codellama-13b-instruct | 13 billion | 16384 | 16384 | ❌ | ❌ | Llama-2 | Meta (request for commercial use) |
codellama-7b | 7 billion | 4096 | 4096 | ❌ | ❌ | Llama-2 | Meta (request for commercial use) |
codellama-7b-instruct | 7 billion | 4096 | 4096 | ❌ | ❌ | Llama-2 | Meta (request for commercial use) |
Mistral Models
Model Name | Parameters | Context Window | Supported Fine-Tuning Context Window | Adapter Pre-load Not Required | Supports GRPO | Architecture | License |
---|---|---|---|---|---|---|---|
mistral-7b-instruct-v0-3 | 7 billion | 32768 | 32768 | ✅ | ❌ | Mistral | Apache 2.0 |
mistral-7b-instruct-v0-2 | 7 billion | 32768 | 32768 | ✅ | ❌ | Mistral | Apache 2.0 |
mistral-7b | 7 billion | 32768 | 32768 | ✅ | ❌ | Mistral | Apache 2.0 |
mistral-7b-instruct | 7 billion | 32768 | 32768 | ✅ | ❌ | Mistral | Apache 2.0 |
mistral-nemo-12b-2407 | 12 billion | 131072 | 32768 | ❌ | ❌ | Mistral | Apache 2.0 |
mistral-nemo-12b-instruct-2407 | 12 billion | 131072 | 32768 | ❌ | ❌ | Mistral | Apache 2.0 |
zephyr-7b-beta | 7 billion | 32768 | 32768 | ✅ | ❌ | Mistral | MIT |
mixtral-8x7b-instruct-v0-1 | 46.7 billion | 32768 | 7168 | ❌ | ❌ | Mixtral | Apache 2.0 |
Solar Models
Model Name | Parameters | Context Window | Supported Fine-Tuning Context Window | Adapter Pre-load Not Required | Supports GRPO | Architecture | License |
---|---|---|---|---|---|---|---|
solar-1-mini-chat-240612 | 10.7 billion | 32768 | 32768 | ✅ | ❌ | Llama | Custom License |
solar-pro-preview-instruct-v2 | 22.1 billion | 4096 | 4096 | ❌ | ❌ | Solar | Custom License |
solar-pro-241126 | 22.1 billion | 32768 | 16384 | ❌ | ❌ | Solar | Custom License |
Gemma Models
Model Name | Parameters | Context Window | Supported Fine-Tuning Context Window | Adapter Pre-load Not Required | Supports GRPO | Architecture | License |
---|---|---|---|---|---|---|---|
gemma-2-27b | 27.2 billion | 8192 | 4096 | ❌ | ❌ | Gemma | |
gemma-2-27b-instruct | 27.2 billion | 8192 | 4096 | ❌ | ❌ | Gemma | |
gemma-2-9b | 9.24 billion | 8192 | 8192 | ❌ | ❌ | Gemma | |
gemma-2-9b-instruct | 9.24 billion | 8192 | 8192 | ❌ | ❌ | Gemma | |
gemma-7b | 8.5 billion | 8192 | 8192 | ❌ | ❌ | Gemma | |
gemma-7b-instruct | 8.5 billion | 8192 | 8192 | ❌ | ❌ | Gemma | |
gemma-2b | 2.5 billion | 8192 | 8192 | ❌ | ❌ | Gemma | |
gemma-2b-instruct | 2.5 billion | 8192 | 8192 | ❌ | ❌ | Gemma |
Phi Models
Model Name | Parameters | Context Window | Supported Fine-Tuning Context Window | Adapter Pre-load Not Required | Supports GRPO | Architecture | License |
---|---|---|---|---|---|---|---|
phi-3-5-mini-instruct | 3.8 billion | 131072 | 16384 | ❌ | ✅ | Phi-3 | Microsoft |
phi-3-mini-4k-instruct | 3.8 billion | 4096 | 4096 | Turbo LoRA not supported | ✅ | Phi-3 | Microsoft |
phi-2 | 2.7 billion | 2048 | 2048 | Turbo not supported | ✅ | Phi-2 | Microsoft |
Other Models
Model Name | Parameters | Context Window | Supported Fine-Tuning Context Window | Adapter Pre-load Not Required | Supports GRPO | Architecture | License |
---|---|---|---|---|---|---|---|
deepseek-r1-distill-qwen-32b | 32.8 billion | 131072 | 8000 | ❌ | ✅ | Qwen | DeepSeek-AI |
openhands-lm-32b-v0.1 | 32.8 billion | 131072 | 16384 | ❌ | ✅ | Qwen | Xingyao Wang |
- Base model (llama-2-7b, etc): These are models that are primarily trained on the objective of text completion.
- Instruction-Tuned (llama-2-7b-chat, mistral-7b-instruct, etc): These are models that have been further trained on (instruction, output) pairs in order to better respond to human instruction-styled inputs. The instructions effectively constrains the model’s output to align with the response characteristics or domain knowledge.
Vision Language Models
Currently, we support fine-tuning LoRAs for the following vision language models:Qwen VL
Model Name | Parameters | Architecture | License | LoRA | Turbo LoRA | Turbo | Context Window | Supported Fine-Tuning Context Window | Supports GRPO |
---|---|---|---|---|---|---|---|---|---|
qwen2-vl-7b-instruct | 7B | Qwen2 | Tongyi Qianwen | ✅ | ❌ | ❌ | 128000 | 32768 | ❌ |
qwen2-5-vl-3b-instruct | 3B | Qwen2.5 | Tongyi Qianwen | ✅ | ❌ | ❌ | 128000 | 32768 | ❌ |
qwen2-5-vl-7b-instruct | 7B | Qwen2.5 | Tongyi Qianwen | ✅ | ❌ | ❌ | 128000 | 32768 | ❌ |
Best-Effort LLMs (via HuggingFace)
Best-effort support for fine-tuning is also offered for any Huggingface LLM meeting the following criteria:- Has the “Text Generation” and “Transformer” tags
- Does not have a “custom_code” tag
- Are not post-quantized (ex. model containing a quantization method such as “AWQ” in the name)
- Has text inputs and outputs
Fine-tuning a custom LLM
- Get the Huggingface ID for your model by clicking the copy icon on the base model’s Huggingface page, ex.
BioMistral/BioMistral-7B
. - Pass the Huggingface ID as the
base_model
.
show_tensorboard=True
to the create
call:
Note that if you fine-tune a custom model not on our shared deployments
list, you’ll need to deploy the custom base
model as a private serverless deployment in
order to run inference on your newly trained adapter. This is also supported on
a best-effort basis.