DML: Why & when do you need to fine-tune open-source LLMs? What about fine-tuning vs. prompt engineering?
Lesson 5 | The Hands-on LLMs Series
Hello there, I am Paul Iusztin ๐๐ผ
Within this newsletter, I will help you decode complex topics about ML & MLOps one week at a time ๐ฅ
Lesson 5 | The Hands-on LLMs Series
Table of Contents:
Using this Python package, you can x10 your text preprocessing pipeline development.
Why & when do you need to fine-tune open-source LLMs? What about fine-tuning vs. prompt engineering?
Fine-tuning video lessons
Previous Lessons:
โณ๐ Check out the Hands-on LLMs course and support it with a โญ.
#1. Using this Python package, you can x10 your text preprocessing pipeline development
Any text preprocessing pipeline has to clean, partition, extract, or chunk text data to feed it into your LLMs.
๐๐ป๐๐๐ฟ๐๐ฐ๐๐๐ฟ๐ฒ๐ฑ offers a ๐ฟ๐ถ๐ฐ๐ต and ๐ฐ๐น๐ฒ๐ฎ๐ป ๐๐ฃ๐ that allows you to quickly:
- ๐ฑ๐ข๐ณ๐ต๐ช๐ต๐ช๐ฐ๐ฏ your data into smaller segments from various data sources (e.g., HTML, CSV, PDFs, even images, etc.)
- ๐ค๐ญ๐ฆ๐ข๐ฏ๐ช๐ฏ๐จ the text of anomalies (e.g., wrong ASCII characters), any irrelevant information (e.g., white spaces, bullets, etc.), and filling missing values
- ๐ฆ๐น๐ต๐ณ๐ข๐ค๐ต๐ช๐ฏ๐จ information from pieces of text (e.g., datetimes, addresses, IP addresses, etc.)
- ๐ค๐ฉ๐ถ๐ฏ๐ฌ๐ช๐ฏ๐จ your text segments into pieces of text that can be inserted into your embedding model
- ๐ฆ๐ฎ๐ฃ๐ฆ๐ฅ๐ฅ๐ช๐ฏ๐จ data (e.g., wrapper over OpenAIEmbeddingEncoder, HuggingFaceEmbeddingEncoders, etc.)
- ๐ด๐ต๐ข๐จ๐ฆ your data to be fed into various tools (e.g., Label Studio, Label Box, etc.)
๐๐น๐น ๐๐ต๐ฒ๐๐ฒ ๐๐๐ฒ๐ฝ๐ ๐ฎ๐ฟ๐ฒ ๐ฒ๐๐๐ฒ๐ป๐๐ถ๐ฎ๐น ๐ณ๐ผ๐ฟ:
- feeding your data into your LLMs
- embedding the data and ingesting it into a vector DB
- doing RAG
- labeling
- recommender systems
... basically for any LLM or multimodal applications
.
Implementing all these steps from scratch will take a lot of time.
I know some Python packages already do this, but the functionality is scattered across multiple packages.
๐๐ป๐๐๐ฟ๐๐ฐ๐๐๐ฟ๐ฒ๐ฑ packages everything together under a nice, clean API.
โณ Check it out.
#2. Why & when do you need to fine-tune open-source LLMs? What about fine-tuning vs. prompt engineering?
Fine-tuning is the process of taking a pre-trained model and further refining it on a specific task.
๐๐ถ๐ฟ๐๐, ๐น๐ฒ๐'๐ ๐ฐ๐น๐ฎ๐ฟ๐ถ๐ณ๐ ๐๐ต๐ฎ๐ ๐บ๐ฒ๐๐ต๐ผ๐ฑ๐ ๐ผ๐ณ ๐ณ๐ถ๐ป๐ฒ-๐๐๐ป๐ถ๐ป๐ด ๐ฎ๐ป ๐ผ๐ฝ๐ฒ๐ป-๐๐ผ๐๐ฟ๐ฐ๐ฒ ๐๐๐ ๐ฒ๐
๐ถ๐t โ
- ๐๐ฐ๐ฏ๐ต๐ช๐ฏ๐ถ๐ฆ๐ฅ ๐ฑ๐ณ๐ฆ-๐ต๐ณ๐ข๐ช๐ฏ๐ช๐ฏ๐จ: utilize domain-specific data to apply the same pre-training process (next token prediction) on the pre-trained (base) model
- ๐๐ฏ๐ด๐ต๐ณ๐ถ๐ค๐ต๐ช๐ฐ๐ฏ ๐ง๐ช๐ฏ๐ฆ-๐ต๐ถ๐ฏ๐ช๐ฏ๐จ: the pre-trained (base) model is fine-tuned on a Q&A dataset to learn to answer questions
- ๐๐ช๐ฏ๐จ๐ญ๐ฆ-๐ต๐ข๐ด๐ฌ ๐ง๐ช๐ฏ๐ฆ-๐ต๐ถ๐ฏ๐ช๐ฏ๐จ: the pre-trained model is refined for a specific task, such as toxicity detection, coding, medicine advice, etc.
- ๐๐๐๐: It requires collecting human preferences (e.g., pairwise comparisons), which are then used to train a reward model. The reward model is used to fine-tune the LLM via RL techniques such as PPO.
Common approaches are to take a pre-trained LLM (next-word prediction) and apply instruction & single-task fine-tuning.
๐ช๐ต๐ ๐ฑ๐ผ ๐๐ผ๐ ๐ป๐ฒ๐ฒ๐ฑ ๐๐ผ ๐ณ๐ถ๐ป๐ฒ-๐๐๐ป๐ฒ ๐๐ต๐ฒ ๐๐๐ ?
You do instruction fine-tuning to make the LLM learn to answer your questions.
The exciting part is when you want to fine-tune your LLM on a single task.
Here is why โ
๐ฑ๐ฆ๐ณ๐ง๐ฐ๐ณ๐ฎ๐ข๐ฏ๐ค๐ฆ: it will improve your LLM performance on given use cases (e.g., coding, extracting text, etc.). Mainly, the LLM will specialize in a given task (a specialist will always beat a generalist in its domain)
๐ค๐ฐ๐ฏ๐ต๐ณ๐ฐ๐ญ: you can refine how your model should behave on specific inputs and outputs, resulting in a more robust product
๐ฎ๐ฐ๐ฅ๐ถ๐ญ๐ข๐ณ๐ช๐ป๐ข๐ต๐ช๐ฐ๐ฏ: you can create an army of smaller models, where each is specialized on a particular task, increasing the overall system's performance. Usually, when you fine-tune one task, it reduces the performance of the other tasks (known as the
alignment tax). Thus, having an expert system of multiple smaller models can improve the overall performance.
๐ช๐ต๐ฎ๐ ๐ฎ๐ฏ๐ผ๐๐ ๐ฝ๐ฟ๐ผ๐บ๐ฝ๐ ๐ฒ๐ป๐ด๐ถ๐ป๐ฒ๐ฒ๐ฟ๐ถ๐ป๐ด ๐๐ ๐ณ๐ถ๐ป๐ฒ-๐๐๐ป๐ถ๐ป๐ด?
๐ฅ๐ข๐ต๐ข: use prompting when you don't have data available (~2 examples are enough). Fine-tuning needs at least >=100 examples to work.
๐ค๐ฐ๐ด๐ต: prompting forces you to write long & detailed prompts to achieve your level of performance. You pay per token (API or compute-wise). Thus, when a prompt gets bigger, your costs increase. But, when fine-tuning an LLM, you incorporate all that knowledge inside the model. Hence, you can use smaller prompts with similar performance.
When you start a project, a good strategy is to write a wrapper over an API (e.g., OpenAI's GPT-4, Anyscale, etc.) that defines a desired interface that can easily be swapped with your open-source implementation in future iterations.
โณ๐ Check out the Hands-on LLMs course to see this in action.
#3. Fine-tuning video lessons
As you might know,
from and I are also working on a free Hands-on LLMs course that contains the open-source code + a set of video lessons.Here are the 2 video lessons about fine-tuning โ
01 Hands-on LLMS | Theoretical Part
Here is a ๐ด๐ถ๐ฎ๐ฎ๐ข๐ณ๐บ of the 1๐ด๐ต ๐ท๐ช๐ฅ๐ฆ๐ฐ ๐ญ๐ฆ๐ด๐ด๐ฐ๐ฏ โ
๐ช๐ต๐ ๐ณ๐ถ๐ป๐ฒ-๐๐๐ป๐ฒ ๐น๐ฎ๐ฟ๐ด๐ฒ ๐น๐ฎ๐ป๐ด๐๐ฎ๐ด๐ฒ ๐บ๐ผ๐ฑ๐ฒ๐น๐?
1. ๐๐ฆ๐ณ๐ง๐ฐ๐ณ๐ฎ๐ข๐ฏ๐ค๐ฆ: Fine-tuning a large language model (LLM) can improve performance, especially for specialized tasks.
2. ๐๐ค๐ฐ๐ฏ๐ฐ๐ฎ๐ช๐ค๐ด: Fine-tuned models are smaller and thus cheaper to run. This is crucial, given that LLMs can have billions of parameters.
๐ช๐ต๐ฎ๐ ๐ฑ๐ผ ๐๐ผ๐ ๐ป๐ฒ๐ฒ๐ฑ ๐๐ผ ๐ถ๐บ๐ฝ๐น๐ฒ๐บ๐ฒ๐ป๐ ๐ฎ ๐ณ๐ถ๐ป๐ฒ-๐๐๐ป๐ถ๐ป๐ด ๐ฝ๐ถ๐ฝ๐ฒ๐น๐ถ๐ป๐ฒ?
1. ๐๐ข๐ต๐ข๐ด๐ฆ๐ต: You need a dataset of input-output examples. This dataset can be created manually or semi-automatically using existing LLMs like GPT-3.5.
2. ๐๐ข๐ด๐ฆ ๐๐๐: Choose an open-source LLM from repositories like Hugging Face's Model Hub (e.g., Falcon 7B)
3. ๐๐ช๐ฏ๐ฆ-๐ต๐ถ๐ฏ๐ช๐ฏ๐จ ๐ด๐ค๐ณ๐ช๐ฑ๐ต: Data loader + Trainer
4. ๐๐ฅ๐ท๐ข๐ฏ๐ค๐ฆ๐ฅ ๐ง๐ช๐ฏ๐ฆ-๐ต๐ถ๐ฏ๐ช๐ฏ๐จ ๐ต๐ฆ๐ค๐ฉ๐ฏ๐ช๐ฒ๐ถ๐ฆ๐ด ๐ต๐ฐ ๐ง๐ช๐ฏ๐ฆ-๐ต๐ถ๐ฏ๐ฆ ๐ต๐ฉ๐ฆ ๐ฎ๐ฐ๐ฅ๐ฆ๐ญ ๐ฐ๐ฏ ๐ค๐ฉ๐ฆ๐ข๐ฑ ๐ฉ๐ข๐ณ๐ฅ๐ธ๐ข๐ณ๐ฆ: QLoRA
5. ๐๐๐๐ฑ๐ด: Experiment Tracker + Model Registry
6. ๐๐ฏ๐ง๐ณ๐ข๐ด๐ต๐ณ๐ถ๐ค๐ต๐ถ๐ณ๐ฆ: Comet + Beam
02 Hands-on LLMS | Diving into the code
๐๐ฒ๐ฟ๐ฒ ๐ถ๐ ๐ฎ ๐๐ต๐ผ๐ฟ๐ ๐๐ฎ๐น๐ธ๐๐ต๐ฟ๐ผ๐๐ด๐ต ๐ผ๐ณ ๐๐ต๐ฒ ๐น๐ฒ๐๐๐ผ๐ป โ
1. How to set up the code and environment using Poetry
2. How to configure Comet & Beam
3. How to start the training pipeline locally (if you have a CUDA-enabled GPU) or on Beam (for running your training pipeline on a serverless infrastructure -> doesn't matter what hardware you have).
4. An overview of the code
5. Clarifying why we integrated Poetry, a model registry and linting within the training pipeline.
โThis video is critical for everyone who wants to replicate the training pipeline of our course on their system. The previous lesson focused on the theoretical parts of the training pipeline.
โณ๐ To find out the code & all the videos, check out the Hands-on LLMs GitHub repository.
Thatโs it for today ๐พ
See you next Thursday at 9:00 a.m. CET.
Have a fantastic weekend!
โฆand see you next week for Lesson 6 of the Hands-On LLMs series ๐ฅ
Paul
Whenever youโre ready, here is how I can help you:
The Full Stack 7-Steps MLOps Framework: a 7-lesson FREE course that will walk you step-by-step through how to design, implement, train, deploy, and monitor an ML batch system using MLOps good practices. It contains the source code + 2.5 hours of reading & video materials on Medium.
Machine Learning & MLOps Blog: in-depth topics about designing and productionizing ML systems using MLOps.
Machine Learning & MLOps Hub: a place where all my work is aggregated in one place (courses, articles, webinars, podcasts, etc.).