← Fine Tuning Llama 3.2B with Unsloth

In this article, we will be fine tuning the Llama 3.2B model with Unsloth on the Spider 1.0 SQL dataset. The goal of the article is to improve the SQL capabilities of a general Llama 3.2B model.

Prerequisites

Before we get started, we assume that the reader has access to a GPU which they are able to use for training. Additionally, we assume that the reader has a Python setup.

Creating the Environment

First, we want to import the necessary libraries we need and set up a Python virtaul environment (venv) for the project. To create a Python venv, we can run the following command:

python -m venv venv

This will create a new directory called venv in the current working directory. We can then activate the environment by running the following command:

source venv/bin/activate

Now, we want to install the necessary libraries. Please copy and paste the following requirements.txt file into your local development environment:

requirements.txt

                
unsloth
torch
transformers
trl

And to install it, we can run the following command:

pip install -r requirements.txt

Now we have a working environment and can start fine tuning our model. To do so, create a Jupyter notebook and import the required libraries:

import torch

from unsloth import FastLanguageModel
from unsloth.chat_templates import get_chat_template
from unsloth import is_bfloat16_supported
from transformers import TrainingArguments, DataCollatorForSeq2Seq
from trl import SFTTrainer

You may see many unfamiliar libraries here or want to know what they do. I explain some of them down below:

Library	Description
Unsloth.FastLanguageModel	This class is used to load the model. It acts alot like Hugging Face’s `AutoModelForCausalLM` class but with improved memory and speed efficiencies.
unsloth.chat_templates.get_chat_template	This function is used to get the chat template for the model. It is used to format the input and output of the model.
unsloth.is_bfloat16_supported	This function is used to check if the bfloat16 precision is supported by the model.
transformers.TrainingArguments	This class is used to configure the training parameters. It is used to set the training parameters for the model.
transformers.DataCollatorForSeq2Seq	This class is used to collate the data for the training. It is used to collate the data for the training.
trl.SFTTrainer	This class is used to train the model. It is used to train the model.
torch	This is the standard deep learning library, also known as PyTorch.

Starting the Fine-Tuning Process

We are finally ready to start the process and import our first model.

Loading the Model

                
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Llama-3.2-1B-Instruct",
    max_seq_length = 2048,
    load_in_4bit = True,
    dtype = None
)

            

To do so, we use the FastLanguageModel class to load it. In addition to that, you may notice a few different arguments that are present during the loading process. These arguments are used to customize the behavior and their definitions are down below:

Argument	Description
model_name	The name of the model to load. In this case, we are loading a Llama-3.2 1B Instruct model that is hosted by Unsloth. Instruct means that the model is fine-tuned to follow instructions.
max_seq_length	The maximum sequence length of data that the model can process at once. In this case, we are setting it to 2048 as most SQL prompts are around that size. A larger sequence length will require more memory and time to process and vice versa for smaller ones.
load_in_4bit	By turning this flag to `true`, we are enabling 4-bit quantization of the model. In This will reduce weights from their original precision to 4 bits which will reduce the memory footprint of the model.
dtype	Selecting `none` as the `dtype` means that the library will automatically determine the best data type to use for the model based on your hardware.

Now that we have loaded the model, we can create configuration parameters for the PEFT LoRA fine tuning process.

Creating the Configuration Parameters

                
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # The rank dimention. Lower rank means Less memory usage ex. 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
    "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",   
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

            

Similarly to the above, we clarify some of the arguments in the above configuration:

Argument	Description
r	The rank dimention. Lower rank means Less memory usage which also correspondes with less flexibility.
target_modules	The modules to apply the LoRA to. In this case, we are applying it to all parts of the attention projection layers.
lora_alpha	The alpha value for the LoRA. This is used to control the strength of the LoRA. In other wordes, the strength of the LoRA projection.
lora_dropout	The dropout rate for the LoRA. This is used to prevent overfitting.
bias	The bias term for the LoRA. This is used to control the bias of the LoRA.
use_gradient_checkpointing	The gradient checkpointing method to use. In this case, we are using the `unsloth` method.
random_state	Sets a random seed for reproducibility.

Now that we have configured the trainer, we can start the training process. In this case, it is as simple as the following line of code:

stats = trainer.train()

This will start the training process and print out the loss at every step of the process. You should notice a sharp decrease in loss and a flattening out in the later steps. After a few minutes, you can see that the model has finished. You have now successfully fine-tuned the Llama 3.2B model with Unsloth to perform SQL tasks!