DML: 7-steps to build a production-ready financial assistant using LLMs

How to fine-tune any LLM at scale in under 5 minutes. 7 steps to build a production-ready financial assistant using LLMs.

Paul Iusztin

Oct 12, 2023

Hello there, I am Paul Iusztin 👋🏼

Within this newsletter, I will help you decode complex topics about ML & MLOps one week at a time 🔥

This week’s ML & MLOps topics:

Writing your own ML models is history. How to fine-tune any LLM at scale in under 5 minutes.
7 steps to chain your prompts to build a production-ready financial assistant using LLMs.

Extra: 3 key resources on how to monitor your ML models

#1. Writing your own ML models is history. How to fine-tune any LLM at scale in under 5 minutes.

Writing your own ML models is history.

The true value is in your data, how you prepare it, and your computer power.

To demonstrate my statement. Here is how you can write a Python script to train your LLM at scale in under 5 minutes ↓

#𝟭. Load your data in JSON format and convert it into a Hugging Dataset

#𝟮. Use Huggingface to load the LLM and pass it to the SFTTrainer, along with the tokenizer and training & evaluation datasets.

#𝟯. Wrap your training script with a serverless solution, such as Beam, which quickly lets you access a cluster of GPUs to train large models.

🚨 As you can see, the secret ingredients are not the LLM but:
- the amount of data
- the quality of data
- how you process the data
- $$$ for compute power
- the ability to scale the system

3-steps to write a Python script to train your LLMs at scale [Image by the Author].

💡 My advice

↳ If you don't plan to become an ML researcher, shift your focus from the latest models to your data and infrastructure.

.

𝗡𝗼𝘁𝗲: Integrating serverless services, such as Beam, makes the deployment of your training pipeline fast & seamless, leaving you to focus only on the last piece of the puzzle: your data.

↳🔗 Check out Beam's docs to find out more.

#2. 7 steps to chain your prompts to build a production-ready financial assistant using LLMs.

𝟳 𝘀𝘁𝗲𝗽𝘀 on how to 𝗰𝗵𝗮𝗶𝗻 your 𝗽𝗿𝗼𝗺𝗽𝘁𝘀 to build a production-ready 𝗳𝗶𝗻𝗮𝗻𝗰𝗶𝗮𝗹 𝗮𝘀𝘀𝗶𝘀𝘁𝗮𝗻𝘁 using 𝗟𝗟𝗠𝘀 ↓

When building LLM applications, you frequently have to divide your application into multiple steps & prompts, which are known as "chaining prompts".

Here are 7 standard steps when building a financial assistant using LLMs (or any other assistant) ↓

𝗦𝘁𝗲𝗽 𝟭: Check if the user's question is safe using OpenAI's Moderation API

If the user's query is safe, move to 𝗦𝘁𝗲𝗽 𝟮 ↓

𝗦𝘁𝗲𝗽 𝟮: Query your proprietary data (e.g., financial news) to enrich the prompt with fresh data & additional context.

To do so, you have to:
- use an LM to embed the user's input
- use the embedding to query your proprietary data stored in a vector DB

𝘕𝘰𝘵𝘦: You must use the same LM model to embed:
- the data that will be stored in the vector DB
- the user's question used to query the vector DB

𝗦𝘁𝗲𝗽 𝟯: Build the prompt using:
- a predefined template
- the user's question
- extracted financial news as context
- your conversation history as context

𝗦𝘁𝗲𝗽 𝟰: Call the LLM

𝗦𝘁𝗲𝗽 𝟱: Check if the assistant's answer is safe using the OpenAI's Moderation API.

If the assistant's answer is safe, move to 𝗦𝘁𝗲𝗽 𝟱 ↓

𝗦𝘁𝗲𝗽 𝟲: Use an LLM to check if the final answer is satisfactory.

To do so, you build a prompt using the following:
- a validation predefined template
- the user's initial question
- the assistants answer

The LLM has to give a "yes" or "no" answer.

Thus, if it answers "yes," we show the final answer to the user. Otherwise, we will return a predefined response, such as:
"Sorry, we couldn't answer your question because we don't have enough information."

𝗦𝘁𝗲𝗽 𝟳: Add the user's question and assistant's answer to a history cache. Which will be used to enrich the following prompts with the current conversation.

Just to remind you, the assistant should support a conversation. Thus, it needs to know what happened in the previous questions.

→ In practice, you usually keep only the latest N (question, answer) tuples or a conversation summary to keep your context length under control.

7 Steps to Build a Production-Ready Financial Assistant Using LLMs [Image by the Author]

↳ If you want to see this strategy in action, check out our new FREE Hands-on LLMs course (work in progress) & give it a ⭐ on GitHub to stay updated with its latest progress.

Extra: 3 key resources on how to monitor your ML models

In the last month, I read 100+ ML monitoring articles.

I trimmed them for you to 3 key resources:

1. A series of excellent articles made by Arize AI that will make you understand what ML monitoring is all about.

↳🔗 Arize Articles

2. The Evidently AI Blog, where you can find answers to all your questions regarding ML monitoring.

↳🔗 Evidently Blog

3. The monitoring hands-on examples hosted by DataTalksClub will teach you how to implement an ML monitoring system.

↳🔗 DataTalks Course

After wasting a lot of time reading other resources...

Using these 3 resources is a solid start for learning about monitoring ML systems.

That’s it for today 👾

See you next Thursday at 9:00 a.m. CET.

Have a fantastic weekend!

Paul

Whenever you’re ready, here is how I can help you:

The Full Stack 7-Steps MLOps Framework: a 7-lesson FREE course that will walk you step-by-step through how to design, implement, train, deploy, and monitor an ML batch system using MLOps good practices. It contains the source code + 2.5 hours of reading & video materials on Medium.
Machine Learning & MLOps Blog: in-depth topics about designing and productionizing ML systems using MLOps.
Machine Learning & MLOps Hub: a place where all my work is aggregated in one place (courses, articles, webinars, podcasts, etc.).