Skip to main content

Continue Training (Beta)

After fine-tuning completes, you may choose to continue training on top of the fine-tuned adapter rather than starting from scratch. For example, you may want to see if your adapter's performance improves with further training on the same data. Or, you may have a similar dataset you wish to train on without starting from scratch.

Now, you can use any adapter you have previously fine-tuned as a starting point to continue training.

Starting a continued training run

To continue training on top of an existing run, simply provide the desired adapter ID as follows:

pb.adapters.create(
config=FinetuningConfig(
epochs=3, # The maximum number of ADDITIONAL epochs to train for
enable_early_stopping=False,
),
continue_from_version="myrepo/3", # The adapter version to resume training from
dataset="mydataset",
repo="myrepo"
)

Note that only epochs and enable_early_stopping are available as configurable parameters for continued training. All other parameters and hyperparameters will be inherited from the original training run. Currently training always resumes from the latest checkpoint of the base run - resuming from arbitrary checkpoints is not yet supported.

Training progress of the previous run will be preserved in this new continuation run, including checkpoints and metrics! Under the hood, training picks up exactly where it left off. The optimizer, learning rate scheduler, and RNG state are all restored with the checkpoint alongside the previous run's metrics.

Continued training on a different dataset

An existing run can also be used as the starting point for fine-tuning on a new dataset:

pb.adapters.create(
config=FinetuningConfig(
epochs=3, # The maximum number of epochs to train for
enable_early_stopping=False,
),
continue_from_version="myrepo/3", # The adapter version to resume training from
dataset="mydataset-2",
repo="myrepo"
)

Unlike continued fine-tuning on the same dataset, using a new dataset is like kicking off a fresh fine-tuning run with pre-trained adapter weights. Once again, epochs and enable_early_stopping are the only configurable hyperparameters. The adapter will be initialized from the final checkpoint of the base run.

Training progress from the base run--including metrics and checkpoints--are not preserved in this case. The optimizer, learning rate scheduler, and RNG are all re-initialized to ensure the adapter is tailored to the new dataset.