Predibase home pagelight logodark logo
  • Support
  • Sign In
  • Sign In
Documentation
Python SDK
REST API
User Guides
Getting Started
  • Introduction
  • Quickstart
Inference
  • Overview
  • Querying Models
  • Models
  • Deployments
  • Fine-tuned Adapters
  • Batch Inference
Fine-Tuning
  • Overview
  • Supported Models
  • Datasets
  • Adapters
  • Tasks
  • Distributed Training
  • Evaluation
  • Hyperparameter Tuning
Account
  • Roles & Permissions
  • VPC Provisioning
Integrations
  • Integrations
Examples
  • LoRA Land for Customer Support
  • Toxic Comment Classifier
  • GRPO for Countdown
  • Recommender System → LLM Generation
  • Retrieval-Augmented Generation
Resources
  • Usage and Billing
  • Frequently Asked Questions
  • Changelog
Examples

GRPO for Countdown

Fine-tune a model to play Countdown using reinforcement learning

This example demonstrates how to use the Predibase SDK to use Reinforcement Finetuning to train a model to play Countdown.

Toxic Comment ClassifierRecommender System → LLM Generation
xgithublinkedin
Powered by Mintlify
Assistant
Responses are generated using AI and may contain mistakes.