Adapters are a parameter-efficient way to fine-tune your model on a new task. Instead of updating all the weights in the model, you can add a small number of task-specific parameters to the model.

Predibase provides different adapter types that can be used to improve model performance on a specific task (LoRA), speed up model throughput with speculative decoding (Turbo), or both (Turbo LoRA).

Create an Adapter

Creating a fine-tuned adapter in Predibase requires three things:

  1. A dataset that contains examples of the task you want to fine-tune the model on.
  2. An adapter repository to store and track your adapter versions (experiments).
  3. A configuration object that specifies the hyperparameters for the fine-tuning job.

Python SDK

Fine-tuning jobs can be started using the Predibase Python SDK:

Synchronous Fine-Tuning API

from predibase import Predibase, SFTConfig

pb = Predibase(api_token="<API_TOKEN>")

# Connect a dataset
dataset = pb.datasets.from_file("/path/to/dataset.csv", name="my_dataset")

# Create an adapter repository
repo = pb.repos.create(name="my-repo", description="Test experiments", exists_ok=True)

# Start a fine-tuning job with LoRA
adapter = pb.adapters.create(
    config=SFTConfig(
        base_model="llama-3-1-8b-instruct",
    ),
    dataset=dataset,
    repo=repo
)

Once the training job is created, it typically spends 3-5 minutes waiting in a queue for compute resources to become available and assigned to the job. After this, the job will begin training, and the training logs will be streamed to your console.

Asynchronous Fine-Tuning API

By default, creating an adapter is a blocking (synchronous) call. To create an adapter asynchronously, you can use the Jobs API:

ft_job = pb.finetuning.jobs.create(
    config=SFTConfig(
        base_model="llama-3-1-8b-instruct",
    ),
    dataset=dataset,
    repo=repo,
    watch=False,
)

Web UI

In the Predibase UI, you can start a fine-tuning job by navigating to the Adapters tab, creating a repository, and then creating a new version within the repository.

The UI supports most, but not all, of the parameters available within the Python SDK. For more advanced use cases, we recommend using the Python SDK.

For information about the different types of fine-tuning tasks supported (instruction fine-tuning, text completion, chat), see our Tasks guide.

Adapter Types

LoRA

LoRA (Low-Rank Adaptation) is the default adapter type that improves model quality and alignment with your task. It trains a small set of low-rank matrices while keeping the base model frozen, making it highly efficient.

To start fine-tuning with LoRA:

adapter = pb.adapters.create(
    config=SFTConfig(
        base_model="llama-3-1-8b-instruct",
        adapter="lora",
        rank=16,
    ),
    dataset=dataset,  # Your dataset
    repo=repo,        # Your adapter repository
    description="LoRA fine-tuning with custom rank"
)

Turbo LoRA

Turbo LoRA is a proprietary method developed by Predibase that marries the benefits of LoRA fine-tuning (for quality) with speculative decoding (for speed) to enhance inference throughput (measured in token generated per second) by up to 3.5x for single requests and up to 2x for high queries per second batched workloads depending on the type of the downstream task. Instead of just predicting one token at a time, speculative decoding allows the model to predict and verify several tokens into the future in a single decoding step, significantly accelerating the generation process, making it well-suited for tasks involving long output sequences.

To start fine-tuning with Turbo LoRA:

adapter = pb.adapters.create(
    config=SFTConfig(
        base_model="llama-3-1-8b-instruct",
        adapter="turbo_lora",
    ),
    dataset=dataset,  # Your dataset
    repo=repo,        # Your adapter repository
    description="Turbo LoRA fine-tuning"
)

Turbo

Turbo is a speculative decoding adapter that speeds up inference without changing the model’s outputs. It trains additional layers to predict multiple tokens in parallel, making it useful when you want to accelerate an existing base model or an already fine-tuned LoRA adapter. Note that Turbo adapters don’t train a LoRA, so none of the LoRA specific parameters like rank, alpha, dropout and target modules apply.

To start fine-tuning with Turbo:

adapter = pb.adapters.create(
    config=SFTConfig(
        base_model="llama-3-1-8b-instruct",
        adapter="turbo",
    ),
    dataset=dataset,  # Your dataset
    repo=repo,        # Your adapter repository
    description="Turbo fine-tuning"
)

See the Continue Training for how to continue training with an existing LoRA adapter.

When fine-tuning with any of these adapter types, you can use the apply_chat_template flag to automatically format your prompts with the base model’s chat template. This is particularly useful when fine-tuning instruction-tuned models.

Chat Templates

Open source models come in base (e.g., Llama-3-8B) and instruct (e.g., Llama-3-8B-Instruct) versions. Instruct versions are trained with chat templates that provide consistent instruction formatting. Using model-specific chat templates typically improves fine-tuning performance, especially when fine-tuning instruction-tuned models.

Using Chat Templates

For fine-tuning, apply_chat_template is supported in the SFTConfig:

from predibase import SFTConfig

config=SFTConfig(
    base_model="qwen3-8b",
    adapter="lora", # default: "lora";
    epochs=1, # default: 3
    rank=8, # default: 16
    learning_rate=0.0001, # default: 0.0002
    target_modules=["q_proj", "v_proj", "k_proj"], # default: None (infers [q_proj, v_proj] for qwen3-8b)
    apply_chat_template=True, # default: False
)

When this parameter is set to True, each training sample in the dataset will automatically have the model’s chat template applied to it. Note that this parameter is only supported for instruction and chat fine-tuning, not continued pretraining.

Inference with a Chat Template

If your model was trained with apply_chat_template set to True, please use only the OpenAI-compatible method to query the model because the chat template will automatically be applied to your inputs. You can see sample code in the Python SDK example.

Next Steps