Decoding ML #012: This Is My Favorite Software Design Pattern You Must Know
My Favorite Software Design Pattern That You Must Know as an MLE. Unify Batch and Streaming ML Pipelines.
Hello there, I am Paul Iusztin ๐๐ผ
Within this newsletter, I will help you decode complex topics about ML & MLOps one week at a time ๐ฅ
This week we will cover:
My Favorite Software Design Pattern That You Must Know as an MLE
Unify Batch and Streaming ML Pipelines
But first,
If you want to quickly learn how to ๐ฑ๐ฒ๐๐ถ๐ด๐ป & ๐ฏ๐๐ถ๐น๐ฑ ๐ฎ๐ป ๐ฒ๐ป๐ฑ-๐๐ผ-๐ฒ๐ป๐ฑ ๐ ๐ ๐ฏ๐ฎ๐๐ฐ๐ต ๐ฎ๐ฟ๐ฐ๐ต๐ถ๐๐ฒ๐ฐ๐๐๐ฟ๐ฒ using ๐ ๐๐ข๐ฝ๐ good practices,
I want to let you know that:
โ I presented an overview of "๐ง๐ต๐ฒ ๐๐๐น๐น ๐ฆ๐๐ฎ๐ฐ๐ธ ๐ณ-๐ฆ๐๐ฒ๐ฝ๐ ๐ ๐๐ข๐ฝ๐ ๐๐ฟ๐ฎ๐บ๐ฒ๐๐ผ๐ฟ๐ธ" course.
During the webinar, I had the chance to explain how all the puzzle pieces (aka architecture components) of a batch architecture work together.
If you want to understand how to design:
- a batch architecture
- feature, training, and inference pipelines
- orchestration
- data validation & monitoring
- web app using FastAPI & Streamlit
- deploy & CI/CD pipeline
- adapt the batch architecture to an online system
Then this recording of the 1-hour webinar might be just for you โ
#1. My Favorite Software Design Pattern That You Must Know as an MLE
This is my ๐ณ๐ฎ๐๐ผ๐ฟ๐ถ๐๐ฒ ๐ฑ๐ฒ๐๐ถ๐ด๐ป ๐ฝ๐ฎ๐๐๐ฒ๐ฟ๐ป that you must know as an ML engineer.
Most ML engineers completely ignore software design patterns, but let me explain why you should know this one for your machine learning projects ๐
I am talking about Composite.
The Composite pattern is a structural design pattern that helps you compose objects in a tree-like structure.
Let me explain by starting with the problem.
๐ฃ๐ฟ๐ผ๐ฏ๐น๐ฒ๐บ
Let's say that you want to build an ML pipeline that performs object detection + tracking.
You can easily divide it into smaller pipelines, such as:
1. preprocessing
2. training | inference
3. postprocessing
Also, these 3 pipelines, in their turn, are split into smaller components.
Let's say that to speed up the ML pipeline. You want to run in parallel everything possible.
Thus, depending on the use case, it would be best to have a module to compose components sequentially or in parallel.
โ If you don't think this through, your code can quickly transform into spaghetti.
๐ฆ๐ผ๐น๐๐๐ถ๐ผ๐ป
โ
Now, the Composite design pattern kicks in.
-> ๐๐ฉ๐ช๐ด ๐ช๐ด ๐ฉ๐ฐ๐ธ ๐บ๐ฐ๐ถ ๐ค๐ข๐ฏ ๐ช๐ฎ๐ฑ๐ญ๐ฆ๐ฎ๐ฆ๐ฏ๐ต ๐ต๐ฉ๐ฆ ๐๐ ๐ฑ๐ช๐ฑ๐ฆ๐ญ๐ช๐ฏ๐ฆ ๐ข๐ฃ๐ฐ๐ท๐ฆ ๐ถ๐ด๐ช๐ฏ๐จ ๐ต๐ฉ๐ฆ ๐๐ฐ๐ฎ๐ฑ๐ฐ๐ด๐ช๐ต๐ฆ ๐ฑ๐ข๐ต๐ต๐ฆ๐ณ๐ฏ:
1. Define a standard interface for all the transformations. Let's call it "Transformation."
2. We create an abstract class called "AtomicTransformation" that inherits the "Transformation" interface for an atomic transformation.
3. We implement an abstract class called "CompositeTransformation" for running multiple transformations. This class inherits the "Transformation" interface but also inputs a list of "Transformation" objects as input.
4. Depending on how you want to call a sequence of transformations, you can inherit the "CompositeTransformation" interface and implement classes for:
- "SequenceTransformations"
- "ParallelTransformations,"
- "DistributedTransformations," etc.
5. Now, when you want to implement a granular transformation (e.g., normalize the image). You implement the "AtomicTransformation" interface.
6. When you want to glue multiple transformations together, you leverage the "CompositeTransformation" classes.
7. When you call a "CompositeTransformation" under the hood, it calls the list of "Transformation" objects until it hits an "AtomicTransformation" object which will do the actual transformation.
Note that because both the "AtomicTransformation" and "CompositeTransformation" inherit the "Transformation" interface, you can use them interchangeably, like LEGOs.
That is powerful.
That is why we all love Sklearn and their "Pipeline" interface ๐ฅ
If you want to know how to apply other software design patterns in MLE, here is another article I wrote that you might like: ๐ 10 Underrated Software Patterns Every ML Engineer Should Know
#2. Unify Batch and Streaming ML Pipelines
What happens if you want to introduce a real-time/streaming data source into your system?
You cry. Just kidding. It is a lot easier than it sounds.
Let's get some context.
Until now, you used only a static data source to train your model & compute your features.
But you find out that your business wants to use real-time news feeds as features for your model.
๐ช๐ต๐ฎ๐ ๐ฑ๐ผ ๐๐ผ๐ ๐ฑ๐ผ?
You have to implement 2 ๐ฎ๐ข๐ช๐ฏ ๐ฑ๐ช๐ฑ๐ฆ๐ญ๐ช๐ฏ๐ฆ๐ด ๐ง๐ฐ๐ณ ๐บ๐ฐ๐ถ๐ณ ๐ฏ๐ฆ๐ธ ๐ด๐ต๐ณ๐ฆ๐ข๐ฎ๐ช๐ฏ๐จ ๐ช๐ฏ๐ฑ๐ถ๐ต ๐ด๐ฐ๐ถ๐ณ๐ค๐ฆ:
#๐ญ. One that will quickly transform the raw data into features and make them accessible into the feature store to be used by the production services.
#๐ฎ. One that will store the raw data in the static raw data source (e.g., a warehouse) so it will be used later for experimentation and research.
Before ingesting into your system, the real-time data source might need an extra processing step to standardize and adapt the data format to your interface.
A standard strategy for:
#๐ญ. Kafka as your streaming platform
#๐ฎ. Flink/Kafka Streams as your streaming processing units
For step #2. most of the time, you will have access to out-of-the-box data connectors that quickly load the real-time data into your static data storage (e.g., from Kafka to an S3 bucket or Big Query data warehouse).
To conclude...
To add a streaming data source to your current infrastructure, you need the following:
- Kafka
- Flink/Kafka Streams
- to move your streaming data source into your static one
- to quickly compute features and load them into the feature store
Thus, it isn't hardโjust a lot of infrastructure to set up.
Thatโs it for today ๐พ
See you next Thursday at 9:00 am CET.
Have a fantastic weekend!
Paul
Whenever youโre ready, here is how I can help you:
The Full Stack 7-Steps MLOps Framework: a 7-lesson FREE course that will walk you step-by-step through how to design, implement, train, deploy, and monitor an ML batch system using MLOps good practices. It contains the source code + 2.5 hours of reading & video materials on Medium.
Machine Learning & MLOps Blog: here, I approach in-depth topics about designing and productionizing ML systems using MLOps.