← Fine Tuning Llama 3.2B with Unsloth
In this article, we will be fine tuning the Llama 3.2B model with Unsloth on the Spider 1.0 SQL dataset. The goal of the article is to improve the SQL capabilities of a general Llama 3.2B model.
Prerequisites
Before we get started, we assume that the reader has access to a GPU which they are able to use for training. Additionally, we assume that the reader has a Python setup.
Creating the Environment
First, we want to import the necessary libraries we need and set up a Python virtaul environment (venv) for the project. To create a Python venv, we can run the following command:
python -m venv venv
This will create a new directory called venv in the current working directory. We can then activate the environment by running the following command:
source venv/bin/activate
Now, we want to install the necessary libraries. Please copy and paste the following requirements.txt file into your local development environment:
And to install it, we can run the following command:
pip install -r requirements.txt
Now we have a working environment and can start fine tuning our model. To do so, create a Jupyter notebook and import the required libraries:
import torch
from unsloth import FastLanguageModel
from unsloth.chat_templates import get_chat_template
from unsloth import is_bfloat16_supported
from transformers import TrainingArguments, DataCollatorForSeq2Seq
from trl import SFTTrainer
You may see many unfamiliar libraries here or want to know what they do. I explain some of them down below:
| Library | Description |
|---|---|
| Unsloth.FastLanguageModel | This class is used to load the model. It acts alot like Hugging Face’s AutoModelForCausalLM class but with improved memory and speed efficiencies. |
| unsloth.chat_templates.get_chat_template | This function is used to get the chat template for the model. It is used to format the input and output of the model. |
| unsloth.is_bfloat16_supported | This function is used to check if the bfloat16 precision is supported by the model. |
| transformers.TrainingArguments | This class is used to configure the training parameters. It is used to set the training parameters for the model. |
| transformers.DataCollatorForSeq2Seq | This class is used to collate the data for the training. It is used to collate the data for the training. |
| trl.SFTTrainer | This class is used to train the model. It is used to train the model. |
| torch | This is the standard deep learning library, also known as PyTorch. |
Starting the Fine-Tuning Process
We are finally ready to start the process and import our first model.
To do so, we use theFastLanguageModel class to load it. In addition to that, you may notice a few different arguments that are present during the loading process. These arguments are used to customize the behavior and their definitions are down below:
| Argument | Description |
|---|---|
| model_name | The name of the model to load. In this case, we are loading a Llama-3.2 1B Instruct model that is hosted by Unsloth. Instruct means that the model is fine-tuned to follow instructions. |
| max_seq_length | The maximum sequence length of data that the model can process at once. In this case, we are setting it to 2048 as most SQL prompts are around that size. A larger sequence length will require more memory and time to process and vice versa for smaller ones. |
| load_in_4bit | By turning this flag to true, we are enabling 4-bit quantization of the model. In This will reduce weights from their original precision to 4 bits which will reduce the memory footprint of the model. |
| dtype | Selecting none as the dtype means that the library will automatically determine the best data type to use for the model based on your hardware. |
Now that we have loaded the model, we can create configuration parameters for the PEFT LoRA fine tuning process.
Similarly to the above, we clarify some of the arguments in the above configuration:
| Argument | Description |
|---|---|
| r | The rank dimention. Lower rank means Less memory usage which also correspondes with less flexibility. |
| target_modules | The modules to apply the LoRA to. In this case, we are applying it to all parts of the attention projection layers. |
| lora_alpha | The alpha value for the LoRA. This is used to control the strength of the LoRA. In other wordes, the strength of the LoRA projection. |
| lora_dropout | The dropout rate for the LoRA. This is used to prevent overfitting. |
| bias | The bias term for the LoRA. This is used to control the bias of the LoRA. |
| use_gradient_checkpointing | The gradient checkpointing method to use. In this case, we are using the unsloth method. |
| random_state | Sets a random seed for reproducibility. |
Now that we have configured the trainer, we can start the training process. In this case, it is as simple as the following line of code:
stats = trainer.train()
This will start the training process and print out the loss at every step of the process. You should notice a sharp decrease in loss and a flattening out in the later steps. After a few minutes, you can see that the model has finished. You have now successfully fine-tuned the Llama 3.2B model with Unsloth to perform SQL tasks!