Have you ever wondered how you go about fine-tuning a LLM model? In this article, we're going to do just that.
We'll take a quick look at LLM models from 50,000 feet, and then we'll explore what fine-tuning is, when you should consider using it and then we'll get our hands dirty fine-tuning a very small LLM, including creating our own custom training data. Although the model we're using is very small, the principles and much of the code still applies.
Unless you've been living under a rock for the last couple of years, you would have heard about Chat-GPT from OpenAI. Chat-GPT is a LLM. It's also known as a "Transformer" Model due to the way it is architected, also "Generative" AI more broardly because it literally generates output. Essentially, a LLM predicts the next word
In the beginning there was nothing, nothing but darkness. For LLMs, there is a many billion parameters, many levels deep network of neurons with a completely meaningless random distribution of weights between 0 and 1.
Before the LLM can predict the next word, it needs to learn about words. It needs to learn about language. It needs to learn about the relationships between words. And to do this, it is trained on huge datasets. Depending on the size of the model, it might be trained on everything ever printed by humans across every language.
It's a big deal. It takes huge amounts of compute, storage and electricity. We're talking about tens of millions of US dollars.
And at the end of this process it is very, very good at predicting the next word.
Well that's a good question and beyond the scope of this article. However, very, very broadly...as it's trained, it's given input and it generates output, the prediction. The error is calculated (remember we're using numbers not words) and this error is propagated back through the billions of neurons, nudging it fractionally in the right direction towards the correct answer.
During this lengthy training period, the loss gets smaller and smaller - that is, it gets better and better at predicting the next word.
At this point, it might receive additional training, say, fine-tuning, so that it can learn how to chat or follow instructions.
And in a roundabout way this brings us to the title of our article.
Okay. Fine-tuning can be used to teach your LLM model new knowledge - say, recent news or company-specific details.
Fine-tuning can be used to train the model to output data in a specific format that it doesn't support out-of-the-box.
Perhaps there's a new technique like function-calling and you want to teach the LLM to support it.
There are 3 main approaches:
LoRA / QLoRA "adapters" are much smaller matrices which are added to the original model's layers. Think of it as a patch that sits on top of the existing layers. Outputs from the original layers and adapter are "summed" together. With LoRA / QLoRA you will load the original pre-trained model and then the adapter.
python3 -m venv . source bin/activate
We'll need a handful of libraries to make light work of this.
pip install huggingface_hub pip install datasets pip install transformers pip install torch pip install peft
Let's create a file and just get a feel for running the model as-is and getting it to infer prompting.
from transformers import pipeline import torch if torch.backends.mps.is_available(): device = torch.device("mps") print("MPS is available. Using MPS device.") else: device = torch.device("cpu") print("MPS not available. Using CPU device.") pipe = pipeline("text-generation", model="TinyLlama/TinyLlama-1.1B-Chat-v1.0", device=device) messages = [ {"role": "user", "content": "Tell me about Adrian Latham"}, ] output = pipe(messages) print(output)
Here is an example of the shape of the training data that works for Tiny-Llama.
{ "instruction": "What can you tell me about Adrian Latham?", "input": "What can you tell me about Adrian Latham?", "output": "Adrian Latham is CEO / CTO of The Disruption Laboratory Ltd. He is 50 years old and living in Da Nang" }
So, I'm only using the tiniest of training data. In reality you will want hundreds if not thousands of examples.
from transformers import AutoTokenizer, AutoModelForCausalLM from datasets import Dataset import json model_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id) data = [ { "instruction": "What can you tell me about Adrian Latham?", "input": "What can you tell me about Adrian Latham?", "output": "Adrian Latham is CEO / CTO of The Disruption Laboratory Ltd. He is 50 years old and living in Da Nang, Vietnam.He's a software engineer." }, { "instruction": "Give me some information on Adrian Latham.", "input": "Give me some information on Adrian Latham.", "output": "Adrian Latham is CEO / CTO of The Disruption Laboratory Ltd. He is 50 years old and living in Da Nang." }, { "instruction": "Could you provide details about Adrian Latham?", "input": "Could you provide details about Adrian Latham?", "output": "Adrian Latham is CEO / CTO of The Disruption Laboratory Ltd. He is 50 years old and living in Da Nang,Vietnam.He's a software engineer." }, ] def combine_fields(example): return {" text ": example[" instruction "] + " " + example[" input "] + " " + example[" output "]} dataset = Dataset.from_list(data).map(combine_fields, batched=False) from transformers import TrainingArguments args = TrainingArguments( output_dir=" output ", num_train_epochs=1, # Reduced epochs per_device_train_batch_size=32, # Increased batch size learning_rate=5e-5, # Slightly reduced learning rate optim=" adamw_torch ", # Use a more robust optimizer warmup_ratio=0.05 # helps stabilize ) from trl import SFTTrainer trainer = SFTTrainer( model=model, train_dataset=dataset, ) trainer.train() save_directory="./ output " model.save_pretrained(save_directory) tokenizer.save_pretrained(save_directory) #Important to save tokenizer as well