Fine-Tuning LLMs
This guide will show you how to easily fine-tune LLMs in Predibase, as well as how to customize your fine-tuning runs using helpful configuration templates.
Setup
If you're using the Python SDK, you'll first need to initialize your PredibaseClient
object. All SDK examples
below will assume this has already been done. Make sure you've installed the SDK and configured your API token as well.
from predibase import PredibaseClient
# If you've already run `pbase login`, you don't need to provide any credentials here.
#
# If you're running in an environment where the `pbase login` command-line tool isn't available,
# you can also set your API token using `pc = PredibaseClient(token="<your token here>")`.
pc = PredibaseClient()
This guide also assumes you already have a fine-tuning dataset connected in Predibase. If you don't already have such a dataset, check out this section of the LLM quickstart for an example of uploading a dataset file to Predibase, or additional documentation on how to add data to Predibase from sources like S3, Snowflake, and more.
Select a base LLM
Currently, we support fine-tuning all open source LLMs meeting this criteria.
llm = pc.LLM("hf://meta-llama/Llama-2-7b-hf")
Create a prompt template
In this guide, we'll be using a portion of the Code Alpaca dataset that we've already uploaded to Predibase. If we go to the Data section of the Predibase app and click on the dataset, we can see its columns and some sampled data:
Since the Code Alpaca dataset is intended to teach an LLM how to generate code snippets (the output
column) given an
instruction
and optional input
, our prompt template might look something like this:
- Python SDK
- CLI
# Note the 4-space indentation, which is necessary for the YAML templating.
prompt_template = """Below is an instruction that describes a task, paired with an input
that may provide further context. Write a response that appropriately
completes the request.
### Instruction: {instruction}
### Input: {input}
### Response:
"""
# Note the 4-space indentation, which is necessary for the YAML templating.
read -d '' PROMPT_TEMPLATE << EOF
Below is an instruction that describes a task, paired with an input
that may provide further context. Write a response that appropriately
completes the request.
### Instruction: {instruction}
### Input: {input}
### Response:
EOF
During training, the {instruction}
and {input}
template vars will be automatically replaced by the corresponding column
values for each row in the dataset. Make sure to use names that match the columns from your chosen dataset!
Fine-tuning with defaults
The quickest way to start fine-tuning is to use Predibase's default configuration, which is pre-initialized with sensible settings for you.
- Python SDK
- CLI
job = llm.finetune(
prompt_template=prompt_template,
target="output", # The column that the model should learn to generate.
dataset="file_uploads/code_alpaca_800",
)
# Wait for model training to complete
model = job.get()
pbase finetune llm \
--base-model hf://meta-llama/Llama-2-7b-hf \
--repo-name my-llm \
--prompt-template $PROMPT_TEMPLATE \
--target output \
--dataset file_uploads/code_alpaca_800 \
--wait
Customizing fine-tuning with different parameter values in-line
Commonly-used modeling parameters like epochs, batch size, and learning rate can be set in-line.
- Python SDK
- CLI
job = llm.finetune(
prompt_template=prompt_template,
target="output", # The column that the model should learn to generate.
dataset="file_uploads/code_alpaca_800",
epochs=5,
# train_steps=10000, # An alternative to epochs for specifying a training runway.
learning_rate=0.0002,
)
# Wait for model training to complete
model = job.get()
pbase finetune llm \
--base-model hf://meta-llama/Llama-2-7b-hf \
--repo-name my-llm \
--prompt-template $PROMPT_TEMPLATE \
--target output \
--dataset file_uploads/code_alpaca_800 \
--epochs 5 \
--learning_rate=0.0002 \
--wait
Customizing fine-tuning with additional templates
In addition to the default fine-tuning template, we also offer other helpful templates that help you tweak common settings like quantization without needing to write a configuration from scratch.
- Python SDK
- CLI
# Get the set of fine-tuning templates for the base LLM
tmpls = llm.get_finetune_templates()
# Quickly compare the templates
tmpls.compare()
# ┌───────────┬────────────┬─────────────────────────────────────────────────────────────────────┐
# │ Default │ Name │ Description │
# ├───────────┼────────────┼─────────────────────────────────────────────────────────────────────┤
# │ │ lora_bf16 │ Fine-tunes by loading the model weights in half precision, │
# │ │ │ but without quantization. This is slower to train and requires │
# │ │ │ a larger and more expensive GPU, but typically provides better │
# │ │ │ fine-tuned model performance compared to quantized training. │
# ├───────────┼────────────┼─────────────────────────────────────────────────────────────────────┤
# │ ------> │ qlora_4bit │ Emphasizes a smaller memory footprint through 4-bit quantization │
# │ │ │ of the model weights. This typically leads to fast fine-tuning │
# │ │ │ using smaller GPU instances, but with a minor decrease in │
# │ │ │ fine-tuned model performance. │
# ├───────────┼────────────┼─────────────────────────────────────────────────────────────────────┤
# │ │ qlora_8bit │ Fine-tuning with 8-bit quantization of the model weights. Slightly │
# │ │ │ slower and may require larger and more expensive GPUs to train than │
# │ │ │ qlora_4bit, but may improve fine-tuned model performance. │
# └───────────┴────────────┴─────────────────────────────────────────────────────────────────────┘
# Select a template of your choice and fine-tune!
my_tmpl = tmpls["lora_bf16"]
my_tmpl.run(
prompt_template=prompt_template,
target="output",
dataset="file_uploads/code_alpaca_800",
)
# List out available templates
pbase finetune templates --base-model=hf://meta-llama/Llama-2-7b-hf
pbase finetune llm \
--base-model hf://meta-llama/Llama-2-7b-hf \
--repo-name my-llm \
--prompt-template $PROMPT_TEMPLATE \
--target output \
--template=lora_fp16
--dataset file_uploads/code_alpaca_800 \
--wait
(Advanced) Customize a template
Advanced users only! Freely changing settings can cause training jobs to fail, run out of memory, or result in poor model quality.
If you are comfortable with LLM fine-tuning (especially fine-tuning via Ludwig), you can also modify the template configuration directly:
cfg: dict = my_tmpl.to_config(prompt_template=prompt_template, target="output")
# Modify cfg to your preference
cfg["trainer"]["learning_rate_scheduler"] = {"decay": "linear"}
cfg["trainer"]["optimizer"] = {"type": "adamw"}
cfg["preprocessing"]["global_max_sequence_length"] = 512
llm.finetune(
config=cfg,
prompt_template=prompt_template,
target="output",
dataset="file_uploads/code_alpaca_800",
learning_rate=0.0002
)
In-line arguments take precedence over config values. For example, if the config has:
trainer:
learning_rate: 0.1
And you call:
llm.finetune(
config=cfg,
learning_rate=0.0002,
)
A learning rate of 0.0002 will be used.