Skip to main content

Task Types

Predibase supports three different post-training fine-tuning methods.

Supervised Fine-Tuning

Supervised Fine-Tuning (SFT) focuses on instruction tuning, where the model is trained specifically on the completions (outputs) to improve its instruction following capabilities. This method teaches the model to generate appropriate responses to user prompts by learning from high-quality examples of prompt-completion pairs. SFT is particularly effective for enhancing a model's ability to understand and execute specific instructions, making it more useful for task-oriented applications.

To train your model for instruction following, you can set your task type to sft in our SDK.

Continued Pretraining

Continued Pretraining extends the original pretraining phase of the model by training over your text data using the next token objective loss function. This approach allows the model to further adapt to domain-specific language patterns and knowledge, improving its overall language understanding and generation capabilities. Continued Pretraining is especially valuable when adapting a general-purpose model to specialized domains or when incorporating new knowledge that wasn't present in the original training data.

To train your model for continued pretraining, you can set your task type to continued_pretraining in our SDK.

Reinforcement Learning through Group Relative Policy Optimization (GRPO)

Reinforcement Learning through GRPO is an advanced fine-tuning method that applies reinforcement learning techniques to optimize model behavior without requiring labeled data. Unlike traditional Supervised Fine-Tuning, GRPO uses one or more programmable reward functions to score the correctness of generated outputs during training, allowing the model to self-improve by iteratively refining its responses. This approach is particularly effective for reasoning tasks where Chain-Of-Thoughts (CoT) helps improve base performance, as well as scenarios where explicit labels aren't available but there's an objective metric to determine the correctness of a model's output.

To train your model through reinforcement learning, you can set your task type to grpo in our SDK.

You can read more about GRPO here.