Well-supported LLMs
The following models are currently well-supported for fine-tuning.Large Language Models (Text)
You may fine-tune LoRA, Turbo LoRA, and Turbo adapters on any of these base LLMs, but note that for Turbo LoRA and Turbo adapters, some models may require additional deployment configurations.Qwen Models
| Model Name | Parameters | Context Window | Supported Fine-Tuning Context Window | Adapter Pre-load Not Required | Supports GRPO | Architecture | License |
|---|---|---|---|---|---|---|---|
| qwen3-30b-a3b | 30.5 billion | 131072 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
| qwen3-32b | 32.8 billion | 131072 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
| qwen3-14b | 14.8 billion | 131072 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
| qwen3-8b | 8.19 billion | 131072 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
| qwen2-5-32b | 32.8 billion | 131072 | 16384 | ❌ | ✅ | Qwen | Tongyi Qianwen |
| qwen2-5-32b-instruct | 32.8 billion | 32768 | 16384 | ❌ | ✅ | Qwen | Tongyi Qianwen |
| qwen2-5-14b | 14.8 billion | 131072 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
| qwen2-5-14b-instruct | 14.8 billion | 32768 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
| qwen2-5-7b | 7.62 billion | 131072 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
| qwen2-5-7b-instruct | 7.62 billion | 32768 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
| qwen2-5-3b | 3.09 billion | 131072 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
| qwen2-5-3b-instruct | 3.09 billion | 32768 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
| qwen2-5-1-5b | 1.54 billion | 131072 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
| qwen2-5-1-5b-instruct | 1.54 billion | 32768 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
| qwen2-5-coder-32b-instruct | 32.8 billion | 131072 | 16384 | ❌ | ✅ | Qwen | Tongyi Qianwen |
| qwen2-5-coder-7b-instruct | 7.62 billion | 131072 | 16384 | ❌ | ✅ | Qwen | Tongyi Qianwen |
| qwen2-5-coder-3b-instruct | 3.09 billion | 131072 | 16384 | ❌ | ✅ | Qwen | Tongyi Qianwen |
| qwen2-7b | 7.62 billion | 131072 | 32768 | ❌ | ❌ | Qwen | Tongyi Qianwen |
| qwen2-1-5b-instruct | 1.54 billion | 131072 | 32768 | ❌ | ❌ | Qwen | Tongyi Qianwen |
| qwen2-1-5b | 1.54 billion | 131072 | 32768 | ❌ | ❌ | Qwen | Tongyi Qianwen |
Llama 3 Models
| Model Name | Parameters | Context Window | Supported Fine-Tuning Context Window | Adapter Pre-load Not Required | Supports GRPO | Architecture | License |
|---|---|---|---|---|---|---|---|
| llama-3-3-70b | 70.6 billion | 131072 | 8192 | ❌ | ❌ | Llama-3 | Meta (request for commercial use) |
| llama-3-2-3b | 3.21 billion | 32768 | 32768 | ❌ | ✅ | Llama-3 | Meta (request for commercial use) |
| llama-3-2-3b-instruct | 3.21 billion | 32768 | 32768 | ❌ | ✅ | Llama-3 | Meta (request for commercial use) |
| llama-3-2-1b | 1.24 billion | 32768 | 32768 | ❌ | ✅ | Llama-3 | Meta (request for commercial use) |
| llama-3-2-1b-instruct | 1.24 billion | 32768 | 32768 | ❌ | ✅ | Llama-3 | Meta (request for commercial use) |
| llama-3-1-8b | 8 billion | 62999 | 32768 | ❌ | ✅ | Llama-3 | Meta (request for commercial use) |
| llama-3-1-8b-instruct | 8 billion | 62999 | 32768 | ✅ | ✅ | Llama-3 | Meta (request for commercial use) |
| llama-3-70b | 70 billion | 8192 | 8192 | ❌ | ❌ | Llama-3 | Meta (request for commercial use) |
| llama-3-70b-instruct | 70 billion | 8192 | 8192 | ❌ | ❌ | Llama-3 | Meta (request for commercial use) |
| llama-3-8b | 8 billion | 8192 | 8192 | ❌ | ❌ | Llama-3 | Meta (request for commercial use) |
| llama-3-8b-instruct | 8 billion | 8192 | 8192 | ❌ | ❌ | Llama-3 | Meta (request for commercial use) |
Llama 2 Models
| Model Name | Parameters | Context Window | Supported Fine-Tuning Context Window | Adapter Pre-load Not Required | Supports GRPO | Architecture | License |
|---|---|---|---|---|---|---|---|
| llama-2-70b | 70 billion | 4096 | 4096 | ❌ | ❌ | Llama-2 | Meta (request for commercial use) |
| llama-2-70b-chat | 70 billion | 4096 | 4096 | ❌ | ❌ | Llama-2 | Meta (request for commercial use) |
| llama-2-13b | 13 billion | 4096 | 4096 | ❌ | ❌ | Llama-2 | Meta (request for commercial use) |
| llama-2-13b-chat | 13 billion | 4096 | 4096 | ❌ | ❌ | Llama-2 | Meta (request for commercial use) |
| llama-2-7b | 7 billion | 4096 | 4096 | ❌ | ❌ | Llama-2 | Meta (request for commercial use) |
| llama-2-7b-chat | 7 billion | 4096 | 4096 | ❌ | ❌ | Llama-2 | Meta (request for commercial use) |
CodeLlama Models
| Model Name | Parameters | Context Window | Supported Fine-Tuning Context Window | Adapter Pre-load Not Required | Supports GRPO | Architecture | License |
|---|---|---|---|---|---|---|---|
| codellama-70b-instruct | 70 billion | 4096 | 4096 | ❌ | ❌ | Llama-2 | Meta (request for commercial use) |
| codellama-13b-instruct | 13 billion | 16384 | 16384 | ❌ | ❌ | Llama-2 | Meta (request for commercial use) |
| codellama-7b | 7 billion | 4096 | 4096 | ❌ | ❌ | Llama-2 | Meta (request for commercial use) |
| codellama-7b-instruct | 7 billion | 4096 | 4096 | ❌ | ❌ | Llama-2 | Meta (request for commercial use) |
Mistral Models
| Model Name | Parameters | Context Window | Supported Fine-Tuning Context Window | Adapter Pre-load Not Required | Supports GRPO | Architecture | License |
|---|---|---|---|---|---|---|---|
| mistral-7b-instruct-v0-3 | 7 billion | 32768 | 32768 | ✅ | ❌ | Mistral | Apache 2.0 |
| mistral-7b-instruct-v0-2 | 7 billion | 32768 | 32768 | ✅ | ❌ | Mistral | Apache 2.0 |
| mistral-7b | 7 billion | 32768 | 32768 | ✅ | ❌ | Mistral | Apache 2.0 |
| mistral-7b-instruct | 7 billion | 32768 | 32768 | ✅ | ❌ | Mistral | Apache 2.0 |
| mistral-nemo-12b-2407 | 12 billion | 131072 | 32768 | ❌ | ❌ | Mistral | Apache 2.0 |
| mistral-nemo-12b-instruct-2407 | 12 billion | 131072 | 32768 | ❌ | ❌ | Mistral | Apache 2.0 |
| zephyr-7b-beta | 7 billion | 32768 | 32768 | ✅ | ❌ | Mistral | MIT |
| mixtral-8x7b-instruct-v0-1 | 46.7 billion | 32768 | 7168 | ❌ | ❌ | Mixtral | Apache 2.0 |
Solar Models
| Model Name | Parameters | Context Window | Supported Fine-Tuning Context Window | Adapter Pre-load Not Required | Supports GRPO | Architecture | License |
|---|---|---|---|---|---|---|---|
| solar-1-mini-chat-240612 | 10.7 billion | 32768 | 32768 | ✅ | ❌ | Llama | Custom License |
| solar-pro-preview-instruct-v2 | 22.1 billion | 4096 | 4096 | ❌ | ❌ | Solar | Custom License |
| solar-pro-241126 | 22.1 billion | 32768 | 16384 | ❌ | ❌ | Solar | Custom License |
Gemma Models
| Model Name | Parameters | Context Window | Supported Fine-Tuning Context Window | Adapter Pre-load Not Required | Supports GRPO | Architecture | License |
|---|---|---|---|---|---|---|---|
| gemma-2-27b | 27.2 billion | 8192 | 4096 | ❌ | ❌ | Gemma | |
| gemma-2-27b-instruct | 27.2 billion | 8192 | 4096 | ❌ | ❌ | Gemma | |
| gemma-2-9b | 9.24 billion | 8192 | 8192 | ❌ | ❌ | Gemma | |
| gemma-2-9b-instruct | 9.24 billion | 8192 | 8192 | ❌ | ❌ | Gemma | |
| gemma-7b | 8.5 billion | 8192 | 8192 | ❌ | ❌ | Gemma | |
| gemma-7b-instruct | 8.5 billion | 8192 | 8192 | ❌ | ❌ | Gemma | |
| gemma-2b | 2.5 billion | 8192 | 8192 | ❌ | ❌ | Gemma | |
| gemma-2b-instruct | 2.5 billion | 8192 | 8192 | ❌ | ❌ | Gemma |
Phi Models
| Model Name | Parameters | Context Window | Supported Fine-Tuning Context Window | Adapter Pre-load Not Required | Supports GRPO | Architecture | License |
|---|---|---|---|---|---|---|---|
| phi-3-5-mini-instruct | 3.8 billion | 131072 | 16384 | ❌ | ✅ | Phi-3 | Microsoft |
| phi-3-mini-4k-instruct | 3.8 billion | 4096 | 4096 | Turbo LoRA not supported | ✅ | Phi-3 | Microsoft |
| phi-2 | 2.7 billion | 2048 | 2048 | Turbo not supported | ✅ | Phi-2 | Microsoft |
Other Models
| Model Name | Parameters | Context Window | Supported Fine-Tuning Context Window | Adapter Pre-load Not Required | Supports GRPO | Architecture | License |
|---|---|---|---|---|---|---|---|
| deepseek-r1-distill-qwen-32b | 32.8 billion | 131072 | 8000 | ❌ | ✅ | Qwen | DeepSeek-AI |
| openhands-lm-32b-v0.1 | 32.8 billion | 131072 | 16384 | ❌ | ✅ | Qwen | Xingyao Wang |
- Base model (llama-2-7b, etc): These are models that are primarily trained on the objective of text completion.
- Instruction-Tuned (llama-2-7b-chat, mistral-7b-instruct, etc): These are models that have been further trained on (instruction, output) pairs in order to better respond to human instruction-styled inputs. The instructions effectively constrains the model’s output to align with the response characteristics or domain knowledge.
Vision Language Models
Currently, we support fine-tuning LoRAs for the following vision language models:Qwen VL
| Model Name | Parameters | Architecture | License | LoRA | Turbo LoRA | Turbo | Context Window | Supported Fine-Tuning Context Window | Supports GRPO |
|---|---|---|---|---|---|---|---|---|---|
| qwen2-vl-7b-instruct | 7B | Qwen2 | Tongyi Qianwen | ✅ | ❌ | ❌ | 128000 | 32768 | ❌ |
| qwen2-5-vl-3b-instruct | 3B | Qwen2.5 | Tongyi Qianwen | ✅ | ❌ | ❌ | 128000 | 32768 | ❌ |
| qwen2-5-vl-7b-instruct | 7B | Qwen2.5 | Tongyi Qianwen | ✅ | ❌ | ❌ | 128000 | 32768 | ❌ |
Best-Effort LLMs (via HuggingFace)
Best-effort support for fine-tuning is also offered for any Huggingface LLM meeting the following criteria:- Has the “Text Generation” and “Transformer” tags
- Does not have a “custom_code” tag
- Are not post-quantized (ex. model containing a quantization method such as “AWQ” in the name)
- Has text inputs and outputs
Fine-tuning a custom LLM
- Get the Huggingface ID for your model by clicking the copy icon on the base model’s Huggingface page, ex.
BioMistral/BioMistral-7B. - Pass the Huggingface ID as the
base_model.
show_tensorboard=True to the create
call:
Note that if you fine-tune a custom model not on our shared deployments
list, you’ll need to deploy the custom base
model as a private serverless deployment in
order to run inference on your newly trained adapter. This is also supported on
a best-effort basis.