Skip to main content

Chat Templates

Overview

Open source models typically come in two versions:

  • a pre-trained base model (e.g., Llama-3-8B) and
  • an instruct version (e.g., Llama-3-8B-Instruct).

The instruct version undergoes further training with specific instructions using a chat template. These templates ensure clarity and consistency in instructions, leading to better fine-tuning outcomes. They also focus the model's learning on relevant aspects of the data.

Experimentation shows that using model-specific chat templates significantly boosts performance. While custom templates can be used with base models, model-specific instruction formatting usually results in better performance for instruction tuning both on the base model, but especially on instruction-tuned base models.

Note: Instruction tuning templates are very sensitive to spacing and new line characters. It is also important to use the same instruction tuning formatting from training during inference, otherwise you are very likely to see poor results.

See how to format your dataset for different base models below. If you don't see a model that we support in the base models below, then an instruct template may not be required.

[NEW] Automatically Apply the Chat Template For Finetuning

For fine-tuning, apply_chat_template is now supported in the FinetuningConfig:

config=FinetuningConfig(
base_model="mistral-7b",
adapter="turbo_lora", # default: "lora"; Turbo LoRA is a proprietary fine-tuning method which greatly improves inference throughput for longer output token tasks.
epochs=1, # default: 3
rank=8, # default: 16
learning_rate=0.0001, # default: 0.0002
target_modules=["q_proj", "v_proj", "k_proj"], # default: None (infers [q_proj, v_proj] for mistral-7b)
apply_chat_template=True, # default: False
),

When this parameter is set to True, each training sample in the dataset will automatically have the model's chat template applied to it. Note that this parameter is only supported for instruction and chat fine-tuning, not text completion.

Inference with a Chat Template

If your model was trained with apply_chat_template set to True, please use only the OpenAI chat completions API to query the model because the chat template will automatically be applied to your inputs. :

from openai import OpenAI

api_token = "<PREDIBASE API TOKEN>"
tenant_id = "<PREDIBASE TENANT ID>"
model_name = "<DEPLOYMENT NAME>" # Ex. "mistral-7b"
adapter = "<ADAPTER REPO NAME>/<VERSION NUMBER>" # Ex. "adapter-repo/1"
base_url = f"https://serving.app.predibase.com/{tenant_id}/deployments/v2/llms/{model_name}/v1"

client = OpenAI(
api_key=api_token,
base_url=base_url,
)

content = "<YOUR PROMPT>"

completion = client.chat.completions.create(
model=adapter,
messages=[{"role": "user", "content": content}],
max_tokens=100,
)
print("Completion result:", completion.choices[0].text)

Chat Templates for Base Models

Models

  • llama-2-7b-chat
  • llama-2-13b-chat
  • llama-2-70b-chat

Input Prompt Template

There are two options, depending on whether you want to also include a system message as part of the instruction.

With system message:

<s>[INST] <<SYS>>\n {system_message} \n<</SYS>>\n\n {prompt} [/INST]

Without system message:

<s>[INST] {prompt} [/INST]
info

Make sure there is a space after the final [/INST] tag.

Output Prompt Template

You don't have to modify the completion column.

Example Data

Raw data without the template:

PromptCompletion
Please answer the following question: Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?Natalia sold 48/2 = <<48/2=24>>24 clips in May. Natalia sold 48+24 = <<48+24=72>>72 clips altogether in April and May. Answer: 72

Transformed data after applying the chat template without a system message:

PromptCompletion
<s>[INST] Please answer the following question: Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? [/INST]Natalia sold 48/2 = <<48/2=24>>24 clips in May. Natalia sold 48+24 = <<48+24=72>>72 clips altogether in April and May. Answer: 72