ContinuedPretrainingConfig
Configuration options for continued pretraining
The ContinuedPretrainingConfig
class provides configuration options for continued pretraining of language models using adapter-based approaches. This configuration class is essential when creating new adapters as it defines all the training parameters and hyperparameters needed for the continued pretraining process.
General Hyperparameters
Parameter | Type | Required | Default | Description |
---|---|---|---|---|
base_model | string | Yes | The base model to fine-tune | |
adapter | string | No | lora | The type of adapter to use. One of lora , turbo_lora , turbo |
epochs | integer | No | 3 | Number of epochs to train for |
train_steps | integer | No | Number of training steps (overrides epochs if set) | |
learning_rate | float | No | 0.0002 | Learning rate for training |
enable_early_stopping | boolean | No | true | Whether to enable early stopping |
lr_scheduler | object | No | cosine_with_restarts | Learning rate scheduler configuration |
optimizer | object | No | adamw_torch | Optimizer configuration |
warmup_ratio | float | No | 0.03 | Ratio of training steps to use for warmup |
effective_batch_size | integer | No | 16 | Effective batch size for training |
apply_chat_template | boolean | No | false | Whether to apply the chat template |
LoRA Specific Hyperparameters
Parameter | Type | Required | Default | Description |
---|---|---|---|---|
rank | integer | No | 16 | Rank of the LoRA adapter |
target_modules | array[string] | No | Target modules in attention blocks | List of model modules to fine-tune |
lora_alpha | integer | No | 2 x rank | Alpha parameter for LoRA |
lora_dropout | float | No | 0.00 | Dropout rate for LoRA |