Predibase supports several post-training fine-tuning methods, each designed for specific use cases. This guide explains the different task types and how to use them effectively.

Supervised Fine-tuning

Supervised Fine-Tuning (SFT) focuses on instruction tuning, where the model is trained specifically on the completions (outputs) to improve its instruction following capabilities. This method teaches the model to generate appropriate responses to user prompts by learning from high-quality examples of prompt-completion pairs.

SFT supports two main formats:

  1. Instruction Format: For task-oriented applications requiring specific instructions

    • Uses prompt and completion columns in the dataset
    • Ideal for classification, translation, summarization, and creative tasks
    • Example dataset schema:
      {"prompt": "Translate to French: Hello world", "completion": "Bonjour le monde"}
      
  2. Chat Format: For conversational applications like chatbots and customer support

    • Uses a messages column with JSON-style conversation format
    • Requires at least one user and one assistant role per conversation
    • Example dataset schema:
      {"messages": [{"role": "user", "content": "Hi"}, {"role": "assistant", "content": "Hello!"}]}
      

Example configuration:

from predibase import SFTConfig

adapter = pb.adapters.create(
    config=SFTConfig(
        base_model="llama-3-1-8b-instruct",
        apply_chat_template=True  # Set to True if your dataset doesn't already have the chat template applied
    ),
    dataset="training_dataset",
)

Reinforcement Fine-tuning

Reinforcement Learning through Group Relative Policy Optimization (GRPO) is an advanced fine-tuning method that applies reinforcement learning techniques to optimize model behavior without requiring labeled data. Unlike traditional Supervised Fine-Tuning, GRPO uses one or more programmable reward functions to score the correctness of generated outputs during training, allowing the model to self-improve by iteratively refining its responses.

This approach is particularly effective for:

  • Reasoning tasks where Chain-Of-Thoughts (CoT) helps improve base performance
  • Scenarios where explicit labels aren’t available but there’s an objective metric
  • Optimizing model behavior without extensive human feedback
  • Developing generalized strategies for solving tasks

Example configuration:

adapter = pb.adapters.create(
    config=GRPOConfig(
        base_model="llama-3-1-8b-instruct",
        reward_fns=[
            reward_func_1,
            reward_func_2,
        ],
        ...
    ),
    dataset="training_dataset",
)

Learn more about how to do Reinforcement Learning fine-tuning on Predibase →

Continued Pre-Training

Continued Pre-Training extends the original pretraining phase of the model by training over your text data using the next token objective loss function. This approach allows the model to further adapt to domain-specific language patterns and knowledge, improving its overall language understanding and generation capabilities.

This method is especially valuable for:

  • Domain adaptation (legal, medical, technical documentation)
  • Learning new vocabulary or writing styles
  • Incorporating new knowledge not present in the original training data
  • Improving performance on domain-specific tasks

The dataset requires a single text column containing the training sequences.

from predibase import ContinuedPretrainingConfig

adapter = pb.adapters.create(
    config=ContinuedPretrainingConfig(
        base_model="llama-3-1-8b-instruct",
    ),
    dataset="domain_text_dataset",
)

Learn more about how to do continued pre-training on Predibase →

Function Calling

Function calling fine-tuning enables models to learn which function calls to make based on user requests. This specialized form of fine-tuning teaches models to:

  • Make appropriate function calls based on user requests
  • Format arguments correctly according to function schemas
  • Handle function responses and incorporate them into replies

This is particularly useful for:

  • Building tool-using agents
  • API integration
  • Structured output generation

The dataset requires a specific format with tool definitions and examples of their usage.

Learn more about how to do Function Calling fine-tuning on Predibase →

Next Steps