Fine-Tuning
Evaluation
Evaluating the performance of your fine-tuned adapters
After fine-tuning your model, it’s crucial to evaluate its performance to ensure it meets your requirements and to identify areas for improvement. Predibase provides several tools and methods for evaluating your fine-tuned adapters.
Evaluation Methods
You can evaluate your fine-tuned models in two ways:
- Online Evaluation: Test your model’s performance in real-time through the Predibase API
- Offline Evaluation: Batch evaluate your model’s performance on a test dataset (coming soon)
Evaluation Harness
We provide an evaluation harness as part of our LoRA Bakeoff repository. This harness allows you to:
- Compare different fine-tuning approaches
- Benchmark against baseline models
- Measure performance across multiple metrics
- Evaluate on standard benchmarks
Example Notebooks
The following notebooks provide examples for evaluating models for either binary classification tasks or generation tasks using the Predibase API.
Native support for offline (batch) evaluation is coming soon.