Skip to main content

✨ Fine-Tuning Migration

Quick Overview of Changes (For Existing Users)

👉 Required Actions

  1. Review the new dataset requirements and re-upload your datasets
  2. (SDK users) Update to the latest version of the SDK (pip install -U predibase) and review below for SDK v2 changes, as well as our updated docs
  3. (SDK users) (Recommended) Check out the new End-to-End Example to get a run-down of the changes

Overall

  • Up to 10x faster training times
  • New fine-tuning stack
    • Temporary move away from Ludwig configs ➡️ Instead, new Predibase fine-tuning configs
    • More stringent dataset format
    • Removal of "prompt_template" in the config
  • Adapters as first-class citizen

Existing Models

  • Existing fine-tuned adapters moved to new Adapters view and useable
  • Custom, supervised ML models no longer accessible for non-VPC users

UI

  • Adapters page, for training and monitoring adapters
  • Models pages not visible to non-VPC users
  • Learning curves are temporarily not available. If you would like to monitor your loss values, keep the adapter version page open during training and, upon completion, take a screenshot of the values in the blue message box. (You can also see metrics when training in the SDK.)
    • We'll be bringing back learning curves in the next release, targeted for May 1.

SDK

  • SDK users MUST update to the latest version of the SDK
  • New SDK v2 functions, including:
  • SDK v1 functions will continue to work, except for
    • llm.finetune will NOT work
    • llm.prompt and llm.generate will only work for old adapters
  • New Quickstart and End-to-End Example guides

REST API

  • No changes to REST API interface for prompting

CLI

  • Authenticating - pbase login - will still work
  • Fine-tuning - pbase finetune llm - will NOT work
  • Prompting - pbase prompt llm - will continue to work for old adapters, but will NOT work for any new adapters
  • Deploy - pbase deploy llm - will continue to work for deploying base models
  • Upload dataset - pbase upload dataset - will continue to work

Original Announcement (Posted on April 4, 2024)

Hi Predibase community! We’d like to announce a big upcoming revamp of fine-tuning at Predibase, slated for launch in mid-April, that will dramatically improve the fine-tuning experience.

  • Faster training times: Last week we already moved all fine-tuning jobs to A100s, which has already resulted in 2x-5x speedups. From our experiments, this additional revamp will result in 10x faster training times.
  • Future support for new fine-tuning tasks: In this mid-April launch, we’ll continue to focus on instruction fine-tuning. This revamp will allow us to launch new fine-tuning use cases such as completions-style fine-tuning (aka domain adaptation) and DPO in the near future.

To support these changes and streamline the fine-tuning experience, we’re introducing a few key changes that all our users should be aware of:

  • Removal of prompt templates: We noticed many users struggle with our existing prompt templating and the lack of visibility into when how they are applied. By removing prompt templates, the input prompt will now expect the fully materialized prompt with any variables already inserted. At inference time, you’ll pass in the entire prompt including any prompt template you used.

  • More stringent dataset format: To help prevent errors, datasets for instruction fine-tuning now must contain the two columns with the following names (Note: Existing datasets that don’t conform to these requirements will no longer work):

  • New configs: We are moving off of Ludwig for our training services at Predibase, which means we’ll be using a new (but similar) config format. We plan to open-source this training framework in the second half of this year.

  • Adapters as first-class citizens in the UI: New Adapters page will replace the existing Models page and will be a dedicated place to manage your fine-tuning experiments and adapters. Existing models will be automatically migrated to the new Adapters page.

  • SDK update required: We’ve been working to improve the user experience of our SDK, which means new functions. The existing fine-tuning functions will no longer work and users will need to update their SDK version. All other functions, such as for inference, will continue to work without disruption.

  • Non-adapter-based fine-tuning only available for VPC customers: As we are doubling down on adapter-based fine-tuning, starting with LoRA, which will be the primary user journey across our interfaces (UI / SDK / REST API), other types of model training such as custom, supervised ML models will no longer be available for non-VPC tier customers.

To reiterate, these changes will only take effect AFTER the mid-April release. There are no immediate actions required. Rest assured that we will provide more details and updated documentation as we get closer to the release date. To stay up to date, you can track the latest updates on this page.

Lastly, we’re thrilled that you chose Predibase and couldn’t be more excited to bring you these exciting changes in the coming weeks! If you have any immediate comments or concerns, please don’t hesitate to reach out to us at support@predibase.com.