Fine Tuning Llama 3.2B with Unsloth

In this article, we will be fine tuning the Llama 3.2B model with Unsloth on the Spider 1.0 SQL dataset. The goal of the article is to improve the SQL capabilities of a general Llama 3.2B model.

Prerequisites

Before we get started, we assume that the reader has access to a GPU which they are able to use for training. Additionally, we assume that the reader has a Python setup.

Creating the Environment

First, we want to import the necessary libraries we need and set up a Python virtaul environment (venv) for the project. To create a Python venv, we can run the following command:

python -m venv venv

This will create a new directory called venv in the current working directory. We can then activate the environment by running the following command:

source venv/bin/activate

Now, we want to install the necessary libraries. Please copy and paste the following requirements.txt file into your local development environment:

requirements.txt

unsloth
torch
transformers
trl

And to install it, we can run the following command:

pip install -r requirements.txt

Now we have a working environment and can start fine tuning our model. To do so, create a Jupyter notebook and import the required libraries:

import torch

from unsloth import FastLanguageModel
from unsloth.chat_templates import get_chat_template
from unsloth import is_bfloat16_supported
from transformers import TrainingArguments, DataCollatorForSeq2Seq
from trl import SFTTrainer

You may see many unfamiliar libraries here or want to know what they do. I explain some of them down below:

Library Description
Unsloth.FastLanguageModel This class is used to load the model. It acts alot like Hugging Face’s AutoModelForCausalLM class but with improved memory and speed efficiencies.
unsloth.chat_templates.get_chat_template This function is used to get the chat template for the model. It is used to format the input and output of the model.
unsloth.is_bfloat16_supported This function is used to check if the bfloat16 precision is supported by the model.
transformers.TrainingArguments This class is used to configure the training parameters. It is used to set the training parameters for the model.
transformers.DataCollatorForSeq2Seq This class is used to collate the data for the training. It is used to collate the data for the training.
trl.SFTTrainer This class is used to train the model. It is used to train the model.
torch This is the standard deep learning library, also known as PyTorch.

Starting the Fine-Tuning Process

We are finally ready to start the process and import our first model.

Loading the Model

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Llama-3.2-1B-Instruct",
    max_seq_length = 2048,
    load_in_4bit = True,
    dtype = None
)
To do so, we use the FastLanguageModel class to load it. In addition to that, you may notice a few different arguments that are present during the loading process. These arguments are used to customize the behavior and their definitions are down below:

Argument Description
model_name The name of the model to load. In this case, we are loading a Llama-3.2 1B Instruct model that is hosted by Unsloth. Instruct means that the model is fine-tuned to follow instructions.
max_seq_length The maximum sequence length of data that the model can process at once. In this case, we are setting it to 2048 as most SQL prompts are around that size. A larger sequence length will require more memory and time to process and vice versa for smaller ones.
load_in_4bit By turning this flag to true, we are enabling 4-bit quantization of the model. In This will reduce weights from their original precision to 4 bits which will reduce the memory footprint of the model.
dtype Selecting none as the dtype means that the library will automatically determine the best data type to use for the model based on your hardware.

Now that we have loaded the model, we can create configuration parameters for the PEFT LoRA fine tuning process.

Creating the Configuration Parameters

model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # The rank dimention. Lower rank means Less memory usage ex. 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
    "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",   
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Similarly to the above, we clarify some of the arguments in the above configuration:

Argument Description
r The rank dimention. Lower rank means Less memory usage which also correspondes with less flexibility.
target_modules The modules to apply the LoRA to. In this case, we are applying it to all parts of the attention projection layers.
lora_alpha The alpha value for the LoRA. This is used to control the strength of the LoRA. In other wordes, the strength of the LoRA projection.
lora_dropout The dropout rate for the LoRA. This is used to prevent overfitting.
bias The bias term for the LoRA. This is used to control the bias of the LoRA.
use_gradient_checkpointing The gradient checkpointing method to use. In this case, we are using the unsloth method.
random_state Sets a random seed for reproducibility.

Now that we have configured the trainer, we can start the training process. In this case, it is as simple as the following line of code:

stats = trainer.train() 

This will start the training process and print out the loss at every step of the process. You should notice a sharp decrease in loss and a flattening out in the later steps. After a few minutes, you can see that the model has finished. You have now successfully fine-tuned the Llama 3.2B model with Unsloth to perform SQL tasks!