Preparing and uploading datasets for fine-tuning
Troubleshooting and Common Errors
ValueError: Trailing data - This generally occurs when a dataset formatted as a JSONL file is uploaded
with a .json file extension. Try uploading the same dataset with the .jsonl file extension.No ':' found when decoding object value - This generally occurs with malformed JSON. Check that the dataset file is
formatted correctly.C error: Expected x fields in line y, saw z - This generally occurs when one or more rows in the dataset contains
too many or too few entries. Check the error message for the problematic line and
make sure that it is formatted correctly. Also, make sure the dataset is
formatted as specified in the section below (How to Structure Your Dataset)ValueError: Expected object or value - This generally occurs with malformed JSON. Check if the code snippet above
can properly read the dataset. The problem may involve the encoding or the
structure of the json file.train or evaluation. To learn more,
check out this section.apply_chat_template=True
in your fine-tuning config. This will automatically apply the appropriate
chat template for the base model.
Note: This is only applicable for instruction tuned models.
user role and one assistant role. weight (0 or 1) can be
passed in for assistant messages to determine whether or not they are used for
calculating loss (0 means no, 1 means yes, defaults to 1).train or evaluation. To learn more,
check out this section..jsonl format.
Example of chat dataset:
messages:
train or evaluation. To learn more,
check out this section.train or evaluation. To learn more,
check out this section.train or evaluation. To learn more,
check out this section. We recommend skipping
the split column for large-scale training.