Structure your Python AI code like a pro
Building a TikTok-like real-time personalized recommender using the 4-stage architecture
This week’s topics:
AI/ML engineering is a combination of SWE and AI
Building a TikTok-like real-time personalized recommender using the 4-stage architecture
Structure your Python ML code using SWE best practices
AI/ML engineering is a combination of SWE and AI
→ So, stop playing with Notebooks and level up your SWE skills with this fantastic platform.
The only way to learn SWE is by building applications that are of different complexities.
One of my favorite platforms for learning SWE follows this same approach: CodeCrafters
They provide an interactive environment where you can code from scratch various popular projects in multiple languages (Python, Rust, and Java).
Tools such as Redis, Shell, BitTorrent, SQLite and Kafka.
I've done their Redis series and loved it.
I will use it when I have more time to improve my Rust skills.
The sad part is that they are not free. You have to pay a subscription.
The good part is that until the end of the year, you can get 𝗮 𝟳-𝗱𝗮𝘆 𝗳𝗿𝗲𝗲 𝘁𝗿𝗶𝗮𝗹 by signing up 𝘂𝘀𝗶𝗻𝗴 𝗺𝘆 𝗮𝗳𝗳𝗶𝗹𝗶𝗮𝘁𝗲 𝗹𝗶𝗻𝗸:
Which is more than enough to go through a series, such as building Kafka!
If you are disciplined and motivated enough 😂
Happy learning!
𝗡𝗼𝘁𝗲: If you subscribe, you also get 𝟰𝟬% 𝗼𝗳𝗳 using my affiliate link, or you can fully 𝗿𝗲𝗶𝗺𝗯𝘂𝗿𝘀𝗲 𝗶𝘁 through your corporate L&D budget.
Building a TikTok-like real-time personalized recommender using the 4-stage architecture
The problem with real-time recommenders is that you must narrow from millions to dozens of item candidates in less than a second while the items are personalized to the user!
The 4-stage recommender architecture solves that!
Let's understand how to implement it using an AI lakehouse, such as Hopsworks
The data flows in 2 ways:
An 𝗼𝗳𝗳𝗹𝗶𝗻𝗲 𝗽𝗶𝗽𝗲𝗹𝗶𝗻𝗲 that computes the candidate embeddings and loads them to a vector index in Hopsworks (offline batch mode).
It leverages the Items Candidate Encoder Model to compute embeddings for all the items in our database.
.
An 𝗼𝗻𝗹𝗶𝗻𝗲 𝗽𝗶𝗽𝗲𝗹𝗶𝗻𝗲 that computes the actual recommendations for a customer (batch, async, real-time or streaming mode).
The online pipeline is split into 4-stages (as the name suggests), starting with the user's requests and ending with the recommendations:
𝘚𝘵𝘢𝘨𝘦 1/
Take the customer_id and other input features, such as the current date, compute the customer embedding using the Customer Query Model and query the Hopsworks vector DB for similar candidate items.
𝘙𝘦𝘥𝘶𝘤𝘦 𝘢 𝘤𝘰𝘳𝘱𝘶𝘴 𝘰𝘧 𝘮𝘪𝘭𝘭𝘪𝘰𝘯𝘴 𝘰𝘧 𝘪𝘵𝘦𝘮𝘴 𝘵𝘰 ~𝘩𝘶𝘯𝘥𝘳𝘦𝘥𝘴.
𝘚𝘵𝘢𝘨𝘦 2/
Takes the candidate items and applies various filters, such as removing items already seen or purchased using a Bloom filter.
𝘚𝘵𝘢𝘨𝘦 3/
During ranking, we load more features from Hopsworks' feature store describing the item and the user's relationship: "(item candidate, customer)."
This is feasible as only a few hundred items are being scored, compared to the millions scored in candidate generation.
The ranking model can use a boosting tree, such as XGBoost or CatBoost, a neural network or even an LLM.
𝘚𝘵𝘢𝘨𝘦 4/
We order the items based on the ranking score plus other optional business logic.
The highest-scoring items are presented to the user and ranked by their score.
𝘙𝘦𝘥𝘶𝘤𝘦 𝘵𝘩𝘦 ~𝘩𝘶𝘯𝘥𝘳𝘦𝘥𝘴 𝘰𝘧 𝘤𝘢𝘯𝘥𝘪𝘥𝘢𝘵𝘦𝘴 𝘰𝘧 𝘪𝘵𝘦𝘮𝘴 𝘵𝘰 ~𝘥𝘰𝘻𝘦𝘯𝘴.
.
All these recommendations are computed in real time (in milliseconds).
As you interact with the platform, you create new features that modify the customer embedding, creating a new list of candidates based on your latest preferences.
...and so on, until you wasted your whole day on TikTok 😰
Read more in our Hands-on H&M Real-Time Personalized Recommender free course:
Or dig directly into the:
Structure your Python ML code using SWE best practices
How do you structure your Python ML code using SWE best practices to make it modular, extendable and scalable using DDD?
As ML in production is no longer a fairy tale, the MLE and SWE worlds converge.
One popular concept in the SWE world is "Domain Driven Design (DDD)," which, in a nutshell, tells you how to structure your:
files
classes
interfaces
...to make your code modular, extendable and testable.
.
𝗠𝗮𝗶𝗻𝗹𝘆, 𝗗𝗗𝗗 𝗽𝗿𝗼𝘃𝗶𝗱𝗲𝘀 𝟰 𝗯𝗶𝗴 𝗯𝘂𝗰𝗸𝗲𝘁𝘀:
1. 𝘥𝘰𝘮𝘢𝘪𝘯: contains your core entities and interfaces
2 𝘢𝘱𝘱𝘭𝘪𝘤𝘢𝘵𝘪𝘰𝘯: contains your business logic
3. i𝘯𝘧𝘳𝘢𝘴𝘵𝘳𝘶𝘤𝘵𝘶𝘳𝘦: implements interfaces with specific infrastructure logic
4. 𝘪𝘯𝘵𝘦𝘳𝘧𝘢𝘤𝘦𝘴: the layer that lets you interact with your code
𝘛𝘩𝘦 𝘳𝘦𝘭𝘢𝘵𝘪𝘰𝘯𝘴𝘩𝘪𝘱 𝘣𝘦𝘵𝘸𝘦𝘦𝘯 𝘵𝘩𝘦𝘴𝘦 𝘣𝘶𝘤𝘬𝘦𝘵𝘴 𝘪𝘴 𝘢𝘴 𝘧𝘰𝘭𝘭𝘰𝘸𝘴:
interfaces / infrastructure → application → domain
This allows you to write modular and flexible code that can quickly be shipped, understood, migrated (to a different infrastructure or interface) and tested.
.
𝗛𝗼𝘄 𝘄𝗼𝘂𝗹𝗱 𝘁𝗵𝗶𝘀 𝗮𝗽𝗽𝗹𝘆 𝘁𝗼 𝗮𝗻 𝗠𝗟 𝗽𝗿𝗼𝗷𝗲𝗰𝘁?
Let's take an LLM and RAG application as an example.
1. The 𝗱𝗼𝗺𝗮𝗶𝗻 will contain the Prompt, Query, Document, Chunk, and Embedding entities.
2. The 𝗮𝗽𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻 layer will contain the logic for collecting and preprocessing the data. It will also implement specific neural networks and advanced RAG chains.
3. The 𝗶𝗻𝗳𝗿𝗮𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲 module will have concrete implementation for your vector DB connectors.
The domain and application layers only know the vector DB connector interfaces. The implementation is done in the infra layer and injected at run time. By doing so, you can easily swap between different infrastructure stacks.
4. The 𝗶𝗻𝘁𝗲𝗿𝗳𝗮𝗰𝗲 layer exposes various facades to interact with your code.
For example, you can implement a simple CLI or something more complex, such as an orchestrator that chains various steps from the application layer.
There is a lot more to DDD, but this is its essence.
Also, if you start reading about DDD, you don't have to apply it 100% to your ML projects, as some things don't make sense in ML projects.
But it is a powerful mindmap that lets you intuitively build modular and extendable code.
For a hands-on example, we applied some of these concepts in the LLM Engineer’s Handbook open-source repository:
Images
If not otherwise stated, all images are created by the author.