Supervised Fine-tuning
Supervised Fine-Tuning (SFT) focuses on instruction tuning, where the model is trained specifically on the completions (outputs) to improve its instruction following capabilities. This method teaches the model to generate appropriate responses to user prompts by learning from high-quality examples of prompt-completion pairs. SFT supports two main formats:-
Instruction Format: For task-oriented applications requiring specific
instructions
- Uses
promptandcompletioncolumns in the dataset - Ideal for classification, translation, summarization, and creative tasks
- Example dataset schema:
- Uses
-
Chat Format: For conversational applications like chatbots and customer
support
- Uses a
messagescolumn with JSON-style conversation format - Requires at least one
userand oneassistantrole per conversation - Example dataset schema:
- Uses a
Reinforcement Fine-tuning
Reinforcement Learning through Group Relative Policy Optimization (GRPO) is an advanced fine-tuning method that applies reinforcement learning techniques to optimize model behavior without requiring labeled data. Unlike traditional Supervised Fine-Tuning, GRPO uses one or more programmable reward functions to score the correctness of generated outputs during training, allowing the model to self-improve by iteratively refining its responses. This approach is particularly effective for:- Reasoning tasks where Chain-Of-Thoughts (CoT) helps improve base performance
- Scenarios where explicit labels aren’t available but there’s an objective metric
- Optimizing model behavior without extensive human feedback
- Developing generalized strategies for solving tasks
Continued Pre-Training
Continued Pre-Training extends the original pretraining phase of the model by training over your text data using the next token objective loss function. This approach allows the model to further adapt to domain-specific language patterns and knowledge, improving its overall language understanding and generation capabilities. This method is especially valuable for:- Domain adaptation (legal, medical, technical documentation)
- Learning new vocabulary or writing styles
- Incorporating new knowledge not present in the original training data
- Improving performance on domain-specific tasks
text column containing the training sequences.
Function Calling
Function calling fine-tuning enables models to learn which function calls to make based on user requests. This specialized form of fine-tuning teaches models to:- Make appropriate function calls based on user requests
- Format arguments correctly according to function schemas
- Handle function responses and incorporate them into replies
- Building tool-using agents
- API integration
- Structured output generation
Classification
Classification fine-tuning enables training LLMs for classification tasks, ensuring your model always outputs one of the valid labels and gets better performance than instruction tuning. The dataset requires two string columns:text and label.
Next Steps
- Learn about dataset preparation for each task type
- Explore different adapter types for fine-tuning
- Start evaluating your fine-tuned models