Fine-tuning with very large datasets in Predibase
sft
and continued_pretraining
task types. It is not available for classification or GRPO task types at this time.Load your dataset
Initialize the tokenizer
Batch tokenize your data
prompt
and completion
columns independently.text
column.Create input_ids and labels
Create a split column (optional)
Save the dataset
Upload the dataset to Predibase