Fine-tuning with very large datasets in Predibase
Load your dataset
Initialize the tokenizer
Batch tokenize your data
prompt
and completion
columns independently.text
column.Create input_ids and labels
Create a split column (optional)
Save the dataset
Upload the dataset to Predibase