DML: Synced Vector DBs - A Guide to Streaming Pipelines for Real-Time RAG in Your LLM Applications
Hello there, I am Paul Iusztin ๐๐ผ
Within this newsletter, I will help you decode complex topics about ML & MLOps one week at a time ๐ฅ
This weekโs ML & MLOps topics:
Synced Vector DBs - A Guide to Streaming Pipelines for Real-Time Rag in Your LLM Applications
Story: If anyone told you that ML or MLOps is easy, they were right. A simple trick I learned the hard way.
This weekโs newsletter is shorter than usual, but I have some great news ๐ฅ
Next week, within the Decoding ML newsletter, I will start a step-by-step series based on theย Hands-On LLMsย course I am developing.
By the end of this series, you will know how to design, build, and deploy a financial assistant powered by LLMs.
โฆall of this for FREE inside the Decoding ML newsletter
โณ๐ Check out the Hands-On LLMs course GitHub page and give it a star to stay updated with our progress.
#1. Synced Vector DBs - A Guide to Streaming Pipelines for Real-Time Rag in Your LLM Applications
To successfully use ๐ฅ๐๐ in your ๐๐๐ ๐ฎ๐ฝ๐ฝ๐น๐ถ๐ฐ๐ฎ๐๐ถ๐ผ๐ป๐, your ๐๐ฒ๐ฐ๐๐ผ๐ฟ ๐๐ must constantly be updated with the latest data.
Here is how you can implement a ๐๐๐ฟ๐ฒ๐ฎ๐บ๐ถ๐ป๐ด ๐ฝ๐ถ๐ฝ๐ฒ๐น๐ถ๐ป๐ฒ to keep your vector DB in sync with your datasets โ
.
๐ฅ๐๐ is a popular strategy when building LLMs to add context to your prompt about your private datasets.
Leveraging your domain data using RAG provides 2 significant benefits:
- you don't need to fine-tune your model as often (or at all)
- avoid hallucinations
.
On the ๐ฏ๐ผ๐ ๐๐ถ๐ฑ๐ฒ, to implement RAG, you have to:
3. Embed the user's question using an embedding model (e.g., BERT). Use the embedding to query your vector DB and find the most similar vectors using a distance function (e.g., cos similarity).
4. Get the top N closest vectors and their metadata.
5. Attach the extracted top N vectors metadata + the chat history to the input prompt.
6. Pass the prompt to the LLM.
7. Insert the user question + assistant answer to the chat history.
.
But the question is, ๐ต๐ผ๐ do you ๐ธ๐ฒ๐ฒ๐ฝ ๐๐ผ๐๐ฟ ๐๐ฒ๐ฐ๐๐ผ๐ฟ ๐๐ ๐๐ฝ ๐๐ผ ๐ฑ๐ฎ๐๐ฒ ๐๐ถ๐๐ต ๐๐ต๐ฒ ๐น๐ฎ๐๐ฒ๐๐ ๐ฑ๐ฎ๐๐ฎ?
โณ You need a real-time streaming pipeline.
How do you implement it?
You need 2 components:
โณ A streaming processing framework. For example, Bytewax is built in Rust for efficiency and exposes a Python interface for ease of use - you don't need Java to implement real-time pipelines anymore.
๐ Bytewax
โณ A vector DB. For example, Qdrant provides a rich set of features and a seamless experience.
๐ Qdrant
.
Here is an example of how to implement a streaming pipeline for financial news โ
#๐ญ. Financial news data source (e.g., Alpaca):
To populate your vector DB, you need a historical API (e.g., RESTful API) to add data to your vector DB in batch mode between a desired [start_date, end_date] range. You can tweak the number of workers to parallelize this step as much as possible.
โ You run this once in the beginning.
You need the data exposed under a web socket to ingest news in real time. So, you'll be able to listen to the news and ingest it in your vector DB as soon as they are available.
โ Listens 24/7 for financial news.
#๐ฎ. Build the streaming pipeline using Bytewax:
Implement 2 input connectors for the 2 different types of APIs: RESTful API & web socket.
The rest of the steps can be shared between both connectors โ
- Clean financial news documents.
- Chunk the documents.
- Embed the documents (e.g., using Bert).
- Insert the embedded documents + their metadata to the vector DB (e.g., Qdrant).
#๐ฏ-๐ณ. When the users ask a financial question, you can leverage RAG with an up-to-date vector DB to search for the latest news in the industry.
#Story. If anyone told you that ML or MLOps is easy, they were right. A simple trick I learned the hard way.
If anyone told you that ๐ ๐ or ๐ ๐๐ข๐ฝ๐ is ๐ฒ๐ฎ๐๐, they were ๐ฟ๐ถ๐ด๐ต๐.
Here is a simple trick that I learned the hard way โ
If you are in this domain, you already know that everything changes fast:
- a new tool every month
- a new model every week
- a new project every day
You know what I did? I stopped caring about all these changes and switched my attention to the real gold.
Which is โ "๐๐ผ๐ฐ๐๐ ๐ผ๐ป ๐๐ต๐ฒ ๐ณ๐๐ป๐ฑ๐ฎ๐บ๐ฒ๐ป๐๐ฎ๐น๐."
.
Let me explain โ
When you constantly chase the latest models (aka FOMO), you will only have a shallow understanding of that new information (except if you are a genius or already deep into that niche).
But the joke's on you. In reality, most of what you think you need to know, you don't.
So you won't use what you learned and forget most of it after 1-2 months.
What a waste of time, right?
.
But...
If you master the fundamentals of the topic, you want to learn.
For example, for deep learning, you have to know:
- how models are built
- how they are trained
- groundbreaking architectures (Resnet, UNet, Transformers, etc.)
- parallel training
- deploying a model, etc.
...when in need (e.g., you just moved on to a new project), you can easily pick up the latest research.
Thus, after you have laid the foundation, it is straightforward to learn SoTA approaches when needed (if needed).
Most importantly, what you learn will stick with you, and you will have the flexibility to jump from one project to another quickly.
.
I am also guilty. I used to FOMO into all kinds of topics until I was honest with myself and admitted I am no Leonardo Da Vinci.
But here is what I did and worked well:
- building projects
- replicating the implementations of famous papers
- teaching the subject I want to learn
... and most importantly, take my time to relax and internalize the information.
To conclude:
- learn ahead only the fundamentals
- learn the latest trend only when needed
Thatโs it for today ๐พ
See you next Thursday at 9:00 a.m. CET.
Have a fantastic weekend!
โฆand see you next week for the beginning of the Hands-On LLMs series ๐ฅ
Paul
Whenever youโre ready, here is how I can help you:
The Full Stack 7-Steps MLOps Framework: a 7-lesson FREE course that will walk you step-by-step through how to design, implement, train, deploy, and monitor an ML batch system using MLOps good practices. It contains the source code + 2.5 hours of reading & video materials on Medium.
Machine Learning & MLOps Blog: in-depth topics about designing and productionizing ML systems using MLOps.
Machine Learning & MLOps Hub: a place where all my work is aggregated in one place (courses, articles, webinars, podcasts, etc.).