Different types of tasks supported for fine-tuning
Predibase supports several post-training fine-tuning methods, each designed for
specific use cases. This guide explains the different task types and how to use
them effectively.
Supervised Fine-Tuning (SFT) focuses on instruction tuning, where the model is
trained specifically on the completions (outputs) to improve its instruction
following capabilities. This method teaches the model to generate appropriate
responses to user prompts by learning from high-quality examples of
prompt-completion pairs.SFT supports two main formats:
Instruction Format: For task-oriented applications requiring specific
instructions
Uses prompt and completion columns in the dataset
Ideal for classification, translation, summarization, and creative tasks
Example dataset schema:
Copy
Ask AI
{"prompt": "Translate to French: Hello world", "completion": "Bonjour le monde"}
Chat Format: For conversational applications like chatbots and customer
support
Uses a messages column with JSON-style conversation format
Requires at least one user and one assistant role per conversation
from predibase import SFTConfigadapter = pb.adapters.create( config=SFTConfig( base_model="llama-3-1-8b-instruct", apply_chat_template=True # Set to True if your dataset doesn't already have the chat template applied ), dataset="training_dataset",)
Reinforcement Learning through Group Relative Policy Optimization (GRPO) is an
advanced fine-tuning method that applies reinforcement learning techniques to
optimize model behavior without requiring labeled data. Unlike traditional
Supervised Fine-Tuning, GRPO uses one or more programmable reward functions to
score the correctness of generated outputs during training, allowing the model
to self-improve by iteratively refining its responses.This approach is particularly effective for:
Reasoning tasks where Chain-Of-Thoughts (CoT) helps improve base performance
Scenarios where explicit labels aren’t available but there’s an objective
metric
Optimizing model behavior without extensive human feedback
Developing generalized strategies for solving tasks
Continued Pre-Training extends the original pretraining phase of the model by
training over your text data using the next token objective loss function. This
approach allows the model to further adapt to domain-specific language patterns
and knowledge, improving its overall language understanding and generation
capabilities.This method is especially valuable for:
Function calling fine-tuning enables models to learn which function calls to make
based on user requests. This specialized form of fine-tuning teaches models to:
Make appropriate function calls based on user requests
Format arguments correctly according to function schemas
Handle function responses and incorporate them into replies