Models available for fine-tuning in Predibase
Model Name | Parameters | Context Window | Supported Fine-Tuning Context Window | Adapter Pre-load Not Required | Supports GRPO | Architecture | License |
---|---|---|---|---|---|---|---|
qwen3-30b-a3b | 30.5 billion | 131072 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
qwen3-32b | 32.8 billion | 131072 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
qwen3-14b | 14.8 billion | 131072 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
qwen3-8b | 8.19 billion | 131072 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
qwen2-5-32b | 32.8 billion | 131072 | 16384 | ❌ | ✅ | Qwen | Tongyi Qianwen |
qwen2-5-32b-instruct | 32.8 billion | 32768 | 16384 | ❌ | ✅ | Qwen | Tongyi Qianwen |
qwen2-5-14b | 14.8 billion | 131072 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
qwen2-5-14b-instruct | 14.8 billion | 32768 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
qwen2-5-7b | 7.62 billion | 131072 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
qwen2-5-7b-instruct | 7.62 billion | 32768 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
qwen2-5-3b | 3.09 billion | 131072 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
qwen2-5-3b-instruct | 3.09 billion | 32768 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
qwen2-5-1-5b | 1.54 billion | 131072 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
qwen2-5-1-5b-instruct | 1.54 billion | 32768 | 32768 | ❌ | ✅ | Qwen | Tongyi Qianwen |
qwen2-5-coder-32b-instruct | 32.8 billion | 131072 | 16384 | ❌ | ✅ | Qwen | Tongyi Qianwen |
qwen2-5-coder-7b-instruct | 7.62 billion | 131072 | 16384 | ❌ | ✅ | Qwen | Tongyi Qianwen |
qwen2-5-coder-3b-instruct | 3.09 billion | 131072 | 16384 | ❌ | ✅ | Qwen | Tongyi Qianwen |
qwen2-7b | 7.62 billion | 131072 | 32768 | ❌ | ❌ | Qwen | Tongyi Qianwen |
qwen2-1-5b-instruct | 1.54 billion | 131072 | 32768 | ❌ | ❌ | Qwen | Tongyi Qianwen |
qwen2-1-5b | 1.54 billion | 131072 | 32768 | ❌ | ❌ | Qwen | Tongyi Qianwen |
Model Name | Parameters | Context Window | Supported Fine-Tuning Context Window | Adapter Pre-load Not Required | Supports GRPO | Architecture | License |
---|---|---|---|---|---|---|---|
llama-3-3-70b | 70.6 billion | 131072 | 8192 | ❌ | ❌ | Llama-3 | Meta (request for commercial use) |
llama-3-2-3b | 3.21 billion | 32768 | 32768 | ❌ | ✅ | Llama-3 | Meta (request for commercial use) |
llama-3-2-3b-instruct | 3.21 billion | 32768 | 32768 | ❌ | ✅ | Llama-3 | Meta (request for commercial use) |
llama-3-2-1b | 1.24 billion | 32768 | 32768 | ❌ | ✅ | Llama-3 | Meta (request for commercial use) |
llama-3-2-1b-instruct | 1.24 billion | 32768 | 32768 | ❌ | ✅ | Llama-3 | Meta (request for commercial use) |
llama-3-1-8b | 8 billion | 62999 | 32768 | ❌ | ✅ | Llama-3 | Meta (request for commercial use) |
llama-3-1-8b-instruct | 8 billion | 62999 | 32768 | ✅ | ✅ | Llama-3 | Meta (request for commercial use) |
llama-3-70b | 70 billion | 8192 | 8192 | ❌ | ❌ | Llama-3 | Meta (request for commercial use) |
llama-3-70b-instruct | 70 billion | 8192 | 8192 | ❌ | ❌ | Llama-3 | Meta (request for commercial use) |
llama-3-8b | 8 billion | 8192 | 8192 | ❌ | ❌ | Llama-3 | Meta (request for commercial use) |
llama-3-8b-instruct | 8 billion | 8192 | 8192 | ❌ | ❌ | Llama-3 | Meta (request for commercial use) |
Model Name | Parameters | Context Window | Supported Fine-Tuning Context Window | Adapter Pre-load Not Required | Supports GRPO | Architecture | License |
---|---|---|---|---|---|---|---|
llama-2-70b | 70 billion | 4096 | 4096 | ❌ | ❌ | Llama-2 | Meta (request for commercial use) |
llama-2-70b-chat | 70 billion | 4096 | 4096 | ❌ | ❌ | Llama-2 | Meta (request for commercial use) |
llama-2-13b | 13 billion | 4096 | 4096 | ❌ | ❌ | Llama-2 | Meta (request for commercial use) |
llama-2-13b-chat | 13 billion | 4096 | 4096 | ❌ | ❌ | Llama-2 | Meta (request for commercial use) |
llama-2-7b | 7 billion | 4096 | 4096 | ❌ | ❌ | Llama-2 | Meta (request for commercial use) |
llama-2-7b-chat | 7 billion | 4096 | 4096 | ❌ | ❌ | Llama-2 | Meta (request for commercial use) |
Model Name | Parameters | Context Window | Supported Fine-Tuning Context Window | Adapter Pre-load Not Required | Supports GRPO | Architecture | License |
---|---|---|---|---|---|---|---|
codellama-70b-instruct | 70 billion | 4096 | 4096 | ❌ | ❌ | Llama-2 | Meta (request for commercial use) |
codellama-13b-instruct | 13 billion | 16384 | 16384 | ❌ | ❌ | Llama-2 | Meta (request for commercial use) |
codellama-7b | 7 billion | 4096 | 4096 | ❌ | ❌ | Llama-2 | Meta (request for commercial use) |
codellama-7b-instruct | 7 billion | 4096 | 4096 | ❌ | ❌ | Llama-2 | Meta (request for commercial use) |
Model Name | Parameters | Context Window | Supported Fine-Tuning Context Window | Adapter Pre-load Not Required | Supports GRPO | Architecture | License |
---|---|---|---|---|---|---|---|
mistral-7b-instruct-v0-3 | 7 billion | 32768 | 32768 | ✅ | ❌ | Mistral | Apache 2.0 |
mistral-7b-instruct-v0-2 | 7 billion | 32768 | 32768 | ✅ | ❌ | Mistral | Apache 2.0 |
mistral-7b | 7 billion | 32768 | 32768 | ✅ | ❌ | Mistral | Apache 2.0 |
mistral-7b-instruct | 7 billion | 32768 | 32768 | ✅ | ❌ | Mistral | Apache 2.0 |
mistral-nemo-12b-2407 | 12 billion | 131072 | 32768 | ❌ | ❌ | Mistral | Apache 2.0 |
mistral-nemo-12b-instruct-2407 | 12 billion | 131072 | 32768 | ❌ | ❌ | Mistral | Apache 2.0 |
zephyr-7b-beta | 7 billion | 32768 | 32768 | ✅ | ❌ | Mistral | MIT |
mixtral-8x7b-instruct-v0-1 | 46.7 billion | 32768 | 7168 | ❌ | ❌ | Mixtral | Apache 2.0 |
Model Name | Parameters | Context Window | Supported Fine-Tuning Context Window | Adapter Pre-load Not Required | Supports GRPO | Architecture | License |
---|---|---|---|---|---|---|---|
solar-1-mini-chat-240612 | 10.7 billion | 32768 | 32768 | ✅ | ❌ | Llama | Custom License |
solar-pro-preview-instruct-v2 | 22.1 billion | 4096 | 4096 | ❌ | ❌ | Solar | Custom License |
solar-pro-241126 | 22.1 billion | 32768 | 16384 | ❌ | ❌ | Solar | Custom License |
Model Name | Parameters | Context Window | Supported Fine-Tuning Context Window | Adapter Pre-load Not Required | Supports GRPO | Architecture | License |
---|---|---|---|---|---|---|---|
gemma-2-27b | 27.2 billion | 8192 | 4096 | ❌ | ❌ | Gemma | |
gemma-2-27b-instruct | 27.2 billion | 8192 | 4096 | ❌ | ❌ | Gemma | |
gemma-2-9b | 9.24 billion | 8192 | 8192 | ❌ | ❌ | Gemma | |
gemma-2-9b-instruct | 9.24 billion | 8192 | 8192 | ❌ | ❌ | Gemma | |
gemma-7b | 8.5 billion | 8192 | 8192 | ❌ | ❌ | Gemma | |
gemma-7b-instruct | 8.5 billion | 8192 | 8192 | ❌ | ❌ | Gemma | |
gemma-2b | 2.5 billion | 8192 | 8192 | ❌ | ❌ | Gemma | |
gemma-2b-instruct | 2.5 billion | 8192 | 8192 | ❌ | ❌ | Gemma |
Model Name | Parameters | Context Window | Supported Fine-Tuning Context Window | Adapter Pre-load Not Required | Supports GRPO | Architecture | License |
---|---|---|---|---|---|---|---|
phi-3-5-mini-instruct | 3.8 billion | 131072 | 16384 | ❌ | ✅ | Phi-3 | Microsoft |
phi-3-mini-4k-instruct | 3.8 billion | 4096 | 4096 | Turbo LoRA not supported | ✅ | Phi-3 | Microsoft |
phi-2 | 2.7 billion | 2048 | 2048 | Turbo not supported | ✅ | Phi-2 | Microsoft |
Model Name | Parameters | Context Window | Supported Fine-Tuning Context Window | Adapter Pre-load Not Required | Supports GRPO | Architecture | License |
---|---|---|---|---|---|---|---|
deepseek-r1-distill-qwen-32b | 32.8 billion | 131072 | 8000 | ❌ | ✅ | Qwen | DeepSeek-AI |
openhands-lm-32b-v0.1 | 32.8 billion | 131072 | 16384 | ❌ | ✅ | Qwen | Xingyao Wang |
Model Name | Parameters | Architecture | License | LoRA | Turbo LoRA | Turbo | Context Window | Supported Fine-Tuning Context Window | Supports GRPO |
---|---|---|---|---|---|---|---|---|---|
qwen2-vl-7b-instruct | 7B | Qwen2 | Tongyi Qianwen | ✅ | ❌ | ❌ | 128000 | 32768 | ❌ |
qwen2-5-vl-3b-instruct | 3B | Qwen2.5 | Tongyi Qianwen | ✅ | ❌ | ❌ | 128000 | 32768 | ❌ |
qwen2-5-vl-7b-instruct | 7B | Qwen2.5 | Tongyi Qianwen | ✅ | ❌ | ❌ | 128000 | 32768 | ❌ |
BioMistral/BioMistral-7B
.base_model
.show_tensorboard=True
to the create
call: