DML: 7-steps to build a production-ready financial assistant using LLMs
How to fine-tune any LLM at scale in under 5 minutes. 7 steps to build a production-ready financial assistant using LLMs.
Hello there, I am Paul Iusztin ๐๐ผ
Within this newsletter, I will help you decode complex topics about ML & MLOps one week at a time ๐ฅ
This weekโs ML & MLOps topics:
Writing your own ML models is history. How to fine-tune any LLM at scale in under 5 minutes.
7 steps to chain your prompts to build a production-ready financial assistant using LLMs.
Extra: 3 key resources on how to monitor your ML models
#1. Writing your own ML models is history. How to fine-tune any LLM at scale in under 5 minutes.
Writing your own ML models is history.
The true value is in your data, how you prepare it, and your computer power.
To demonstrate my statement. Here is how you can write a Python script to train your LLM at scale in under 5 minutes โ
#๐ญ. Load your data in JSON format and convert it into a Hugging Dataset
#๐ฎ. Use Huggingface to load the LLM and pass it to the SFTTrainer, along with the tokenizer and training & evaluation datasets.
#๐ฏ. Wrap your training script with a serverless solution, such as Beam, which quickly lets you access a cluster of GPUs to train large models.
๐จ As you can see, the secret ingredients are not the LLM but:
- the amount of data
- the quality of data
- how you process the data
- $$$ for compute power
- the ability to scale the system
๐ก My advice
โณ If you don't plan to become an ML researcher, shift your focus from the latest models to your data and infrastructure.
.
๐ก๐ผ๐๐ฒ: Integrating serverless services, such as Beam, makes the deployment of your training pipeline fast & seamless, leaving you to focus only on the last piece of the puzzle: your data.
โณ๐ Check out Beam's docs to find out more.
#2. 7 steps to chain your prompts to build a production-ready financial assistant using LLMs.
๐ณ ๐๐๐ฒ๐ฝ๐ on how to ๐ฐ๐ต๐ฎ๐ถ๐ป your ๐ฝ๐ฟ๐ผ๐บ๐ฝ๐๐ to build a production-ready ๐ณ๐ถ๐ป๐ฎ๐ป๐ฐ๐ถ๐ฎ๐น ๐ฎ๐๐๐ถ๐๐๐ฎ๐ป๐ using ๐๐๐ ๐ โ
When building LLM applications, you frequently have to divide your application into multiple steps & prompts, which are known as "chaining prompts".
Here are 7 standard steps when building a financial assistant using LLMs (or any other assistant) โ
๐ฆ๐๐ฒ๐ฝ ๐ญ: Check if the user's question is safe using OpenAI's Moderation API
If the user's query is safe, move to ๐ฆ๐๐ฒ๐ฝ ๐ฎ โ
๐ฆ๐๐ฒ๐ฝ ๐ฎ: Query your proprietary data (e.g., financial news) to enrich the prompt with fresh data & additional context.
To do so, you have to:
- use an LM to embed the user's input
- use the embedding to query your proprietary data stored in a vector DB
๐๐ฐ๐ต๐ฆ: You must use the same LM model to embed:
- the data that will be stored in the vector DB
- the user's question used to query the vector DB
๐ฆ๐๐ฒ๐ฝ ๐ฏ: Build the prompt using:
- a predefined template
- the user's question
- extracted financial news as context
- your conversation history as context
๐ฆ๐๐ฒ๐ฝ ๐ฐ: Call the LLM
๐ฆ๐๐ฒ๐ฝ ๐ฑ: Check if the assistant's answer is safe using the OpenAI's Moderation API.
If the assistant's answer is safe, move to ๐ฆ๐๐ฒ๐ฝ ๐ฑ โ
๐ฆ๐๐ฒ๐ฝ ๐ฒ: Use an LLM to check if the final answer is satisfactory.
To do so, you build a prompt using the following:
- a validation predefined template
- the user's initial question
- the assistants answer
The LLM has to give a "yes" or "no" answer.
Thus, if it answers "yes," we show the final answer to the user. Otherwise, we will return a predefined response, such as:
"Sorry, we couldn't answer your question because we don't have enough information."
๐ฆ๐๐ฒ๐ฝ ๐ณ: Add the user's question and assistant's answer to a history cache. Which will be used to enrich the following prompts with the current conversation.
Just to remind you, the assistant should support a conversation. Thus, it needs to know what happened in the previous questions.
โ In practice, you usually keep only the latest N (question, answer) tuples or a conversation summary to keep your context length under control.
โณ If you want to see this strategy in action, check out our new FREE Hands-on LLMs course (work in progress) & give it a โญ on GitHub to stay updated with its latest progress.
Extra: 3 key resources on how to monitor your ML models
In the last month, I read 100+ ML monitoring articles.
I trimmed them for you to 3 key resources:
1. A series of excellent articles made by Arize AI that will make you understand what ML monitoring is all about.
โณ๐ Arize Articles
2. The Evidently AI Blog, where you can find answers to all your questions regarding ML monitoring.
โณ๐ Evidently Blog
3. The monitoring hands-on examples hosted by DataTalksClub will teach you how to implement an ML monitoring system.
โณ๐ DataTalks Course
After wasting a lot of time reading other resources...
Using these 3 resources is a solid start for learning about monitoring ML systems.
Thatโs it for today ๐พ
See you next Thursday at 9:00 a.m. CET.
Have a fantastic weekend!
Paul
Whenever youโre ready, here is how I can help you:
The Full Stack 7-Steps MLOps Framework: a 7-lesson FREE course that will walk you step-by-step through how to design, implement, train, deploy, and monitor an ML batch system using MLOps good practices. It contains the source code + 2.5 hours of reading & video materials on Medium.
Machine Learning & MLOps Blog: in-depth topics about designing and productionizing ML systems using MLOps.
Machine Learning & MLOps Hub: a place where all my work is aggregated in one place (courses, articles, webinars, podcasts, etc.).