Predibase home page
Search...
⌘K
Ask AI
Support
Sign In
Sign In
Search...
Navigation
Examples
GRPO for Countdown
Documentation
Python SDK
REST API
User Guides
Getting Started
Introduction
Quickstart
Inference
Overview
Querying Models
Models
Deployments
Fine-tuned Adapters
Batch Inference
Fine-Tuning
Overview
Supported Models
Datasets
Adapters
Tasks
Distributed Training
Evaluation
Hyperparameter Tuning
Account
Roles & Permissions
VPC Provisioning
Integrations
Integrations
Examples
LoRA Land for Customer Support
Toxic Comment Classifier
GRPO for Countdown
Recommender System → LLM Generation
Retrieval-Augmented Generation
Resources
Usage and Billing
Frequently Asked Questions
Changelog
Examples
GRPO for Countdown
Fine-tune a model to play Countdown using reinforcement learning
This example demonstrates how to use the Predibase SDK to use Reinforcement Finetuning to train a model to play
Countdown
.
Toxic Comment Classifier
Recommender System → LLM Generation
Assistant
Responses are generated using AI and may contain mistakes.