Fine-tuning with very large datasets in Predibase
Load your dataset
Initialize the tokenizer
Batch tokenize your data
prompt and completion columns independently.text column.Create input_ids and labels
Create a split column (optional)
Save the dataset
Upload the dataset to Predibase
