Skip to main content

Fine-Tuning Example

This guide will show you how to easily fine-tune LLMs in Predibase, as well as how to customize your fine-tuning runs using helpful configuration templates.

Setup

If you're using the Python SDK, you'll first need to initialize your PredibaseClient object. All SDK examples below will assume this has already been done. Make sure you've installed the SDK and configured your API token as well.

from predibase import PredibaseClient

# If you've already run `pbase login`, you don't need to provide any credentials here.
#
# If you're running in an environment where the `pbase login` command-line tool isn't available,
# you can also set your API token using `pc = PredibaseClient(token="<your token here>")`.
pc = PredibaseClient()

This guide also assumes you already have a fine-tuning dataset connected in Predibase. If you don't already have such a dataset, check out this section of the LLM quickstart for an example of uploading a dataset file to Predibase.

Select a base LLM

Currently, we support fine-tuning all open source LLMs meeting this criteria.

llm = pc.LLM("hf://meta-llama/Llama-2-7b-hf")

Create a prompt template

In this guide, we'll be using a portion of the Code Alpaca dataset that we've already uploaded to Predibase. If we go to the Data section of the Predibase app and click on the dataset, we can see its columns and some sampled data:

Since the Code Alpaca dataset is intended to teach an LLM how to generate code snippets (the output column) given an instruction and optional input, our prompt template might look something like this:

# Note the 4-space indentation, which is necessary for the YAML templating.
prompt_template = """Below is an instruction that describes a task, paired with an input
that may provide further context. Write a response that appropriately
completes the request.

### Instruction: {instruction}

### Input: {input}

### Response:
"""

During training, the {instruction} and {input} template vars will be automatically replaced by the corresponding column values for each row in the dataset. Make sure to use names that match the columns from your chosen dataset!

Fine-tuning with defaults

The quickest way to start fine-tuning is to use Predibase's default configuration, which is pre-initialized with sensible settings for you.

job = llm.finetune(
prompt_template=prompt_template,
target="output", # The column that the model should learn to generate.
dataset="file_uploads/code_alpaca_800",
)

# Wait for model training to complete
model = job.get()

Customizing fine-tuning parameters

If you would like to customize additional parameters, you have two options:

Option 1: Use llm.finetune

Commonly-used modeling parameters like epochs and learning rate can be set in-line.

job = llm.finetune(
prompt_template=prompt_template,
target="output", # The column that the model should learn to generate.
dataset="file_uploads/code_alpaca_800",
epochs=5,
# train_steps=10000, # An alternative to epochs for specifying a training runway.
learning_rate=0.0002,
)

# Wait for model training to complete
model = job.get()

Option 2: Modify the underlying config

If you're looking to modify other config parameters not listed above, you can modify the underlying Ludwig configuration via templates.

# Get the set of fine-tuning templates for the base LLM
llm = pc.LLM("hf://meta-llama/Llama-2-7b-hf")
tmpls = llm.get_finetune_templates()

# Quickly compare the templates
tmpls.compare()
# ┌───────────┬────────────┬─────────────────────────────────────────────────────────────────────┐
# │ Default │ Name │ Description │
# ├───────────┼────────────┼─────────────────────────────────────────────────────────────────────┤
# │ │ lora_bf16 │ Fine-tunes by loading the model weights in half precision, │
# │ │ │ but without quantization. This is slower to train and requires │
# │ │ │ a larger and more expensive GPU, but typically provides better │
# │ │ │ fine-tuned model performance compared to quantized training. │
# ├───────────┼────────────┼─────────────────────────────────────────────────────────────────────┤
# │ ------> │ qlora_4bit │ Emphasizes a smaller memory footprint through 4-bit quantization │
# │ │ │ of the model weights. This typically leads to fast fine-tuning │
# │ │ │ using smaller GPU instances, but with a minor decrease in │
# │ │ │ fine-tuned model performance. │
# ├───────────┼────────────┼─────────────────────────────────────────────────────────────────────┤

# Get the config for a corresponding template
cfg = tpls["qlora_4bit"].to_config(prompt_template="{instruction}", target="response")

cfg["adapter"]["r"] = 8
cfg["adapter"]["dropout"] = 0.1
cfg["adapter"]["alpha"] = 16

cfg["trainer"]["batch_size"] = 1

ft_job = llm.finetune(
repo="my-finetuned-model",
dataset=dataset,
config=cfg,
epochs=1
)
note

In-line arguments (Option 1) take precedence over config values (Option 2).

Download your fine-tuned LLM

Since we're using adapter-based fine-tuning, the exported model files will contain only the adapter weights, not the full LLM weights.

model = pc.get_model(<your_finetuned_model_repo_name>,<optional_model_version_number>)
model.download(name="llm.zip", location="/path/to/folder")