Multi-modal embeddings in recsys: Beyond text and images
Fine-tune any LLM in four simple steps. Hidden bottlenecks in your code? Not anymore.
This weekโs topics:
Multi-modal embeddings in recsys: Beyond text and images
Hidden bottlenecks in your code? Not anymore
Fine-tune any LLM in four simple steps
Multi-modal embeddings in recsys: Beyond text and images
Want to know how companies like H&M recommend the perfect items?
The ๐ง๐๐ผ-๐ง๐ผ๐๐ฒ๐ฟ ๐ก๐ฒ๐๐๐ผ๐ฟ๐ธ is behind it all - here's how it works:
It embeds users and items into a shared vector space to enable fast and efficient retrieval of recommendations.
To compute the embeddings for users and items, we need something different than the standard encoders we use when working with text or image data.
The model architecture consists of two parallel networks known as towers:
๐ค๐๐ฒ๐ฟ๐ ๐ง๐ผ๐๐ฒ๐ฟ (๐๐๐๐๐ผ๐บ๐ฒ๐ฟ ๐ค๐๐ฒ๐ฟ๐ ๐๐ป๐ฐ๐ผ๐ฑ๐ฒ๐ฟ): This tower encodes user features such as customer_id, age, and other attributes to understand user preferences.
๐๐ฎ๐ป๐ฑ๐ถ๐ฑ๐ฎ๐๐ฒ ๐ง๐ผ๐๐ฒ๐ฟ (๐๐๐ฒ๐บ ๐๐ป๐ฐ๐ผ๐ฑ๐ฒ๐ฟ): This tower encodes item-specific features like article_id, garment_group_name, and other product attributes.
Both towers work independently but generate embeddings that live in the same low-dimensional space.
This shared space allows the model to compare users and items efficiently and determine relevant matches.
Each tower processes its respective inputs by:
๐๐ฒ๐ฎ๐๐๐ฟ๐ฒ ๐๐ป๐ฐ๐ผ๐ฑ๐ถ๐ป๐ด ๐ฎ๐ป๐ฑ ๐๐๐๐ถ๐ผ๐ป: User and item features (both categorical and numerical) are converted into dense embeddings. For example, customer_id and article_id are mapped to vectors leveraging an EmbeddingLayer, while categorical features like garment_group_name use a one-hot encoding layer.
๐ก๐ฒ๐๐ฟ๐ฎ๐น ๐ก๐ฒ๐๐๐ผ๐ฟ๐ธ ๐ฅ๐ฒ๐ณ๐ถ๐ป๐ฒ๐บ๐ฒ๐ป๐: A feedforward neural network (FNN) with multiple dense layers refines these embeddings, reducing them to a low-dimensional space. This compact embedding helps prevent overfitting and ensures more generalized recommendations.
๐๐ถ๐ต ๐ธ๐ฉ๐บ ๐ฅ๐ฐ ๐ญ๐ฐ๐ธ-๐ฅ๐ช๐ฎ๐ฆ๐ฏ๐ด๐ช๐ฐ๐ฏ๐ข๐ญ ๐ฆ๐ฎ๐ฃ๐ฆ๐ฅ๐ฅ๐ช๐ฏ๐จ๐ด ๐ฎ๐ข๐ต๐ต๐ฆ๐ณ?
Because without limiting the dimensions, the model might overfit by memorizing past purchases.
(Meaning users will receive repetitive recommendations of items users already own)
By balancing collaborative filtering and content-based filtering, the Two-Tower Model provides both personalized and diverse recommendations.
The more features added, the more the model leans toward content-based filtering.
But to experiment, you can add more features to both towers, measure the results, compare them and repeat.
Why?
The more user and item features are included, the more the model moves towards content-based filtering.
(resulting in more diverse and dynamic recommendations)
Learn more about implementing and training the Two-Tower Network in our โTraining pipelines for TikTok-like recommendersโ article โ
Hidden bottlenecks in your code? Not anymore (Sponsored)
Every AI developer knows this struggle:
Optimizing AI systems and large-scale applications.
You're working on your first AI project, connecting all the components and algorithms that seem perfect in theory.
But when you hit run, your application drags and takes ages to process.
Why?
It's the bottlenecks.
Maybe itโs I/O, reading images or text data sequentially instead of using async, or poorly parallelized API calls to models like OpenAIโs.
Perhaps youโre not batching your inputs properly, or the code isnโt optimized for parallel processing - and this is all while running locally.
When you push your code to production (especially on Kubernetes clusters) these inefficiencies multiply and become harder to detect.
That's where profilers come in...
They help identify these bottlenecks and guide optimization.
But most traditional profiling tools have limitations:
Too slow. Too basic Only applicable to local environments
None one of them solve the complexity of modern AI workflows at scale.
Thatโs why Iโm excited to share ๐ฃ๐ฒ๐ฟ๐ณ๐ผ๐ฟ๐ฎ๐๐ผ๐ฟ by Yandex.
It's an open-source, distributed profiler thatโs available for all developers on GitHub.

What makes it notable?
It profiles in real-time
It helps you identify which lines of code are costly and reveals performance blind spots.
Works seamlessly across local programs and Kubernetes clusters (as long as they run Linux)
It's capable of profiling CPU, GPU, and I/O
The developers estimate that it may help you save up to 20% on infrastructure costs
Iโve personally seen how these bottlenecks can slow down AI projects.
And it taught me an important lesson:
Having a tool to constantly monitor and optimize your system's performance is essential.
What makes this one even better is that itโs open-sourced and free for everyone.
This is a must-have for engineering teams and CTOs who want to maintain their code quality and infrastructure costs.
Learn more about Perforator in their article:
Or consider directly testing out the tool:
Fine-tune any LLM in four simple steps
It's crazy how easy it is to fine-tune LLMs at scale with Unsloth AI, Hugging Face, and SageMaker. Here are 4 simple steps:
Load a base model of choice from the Hugging Face model registry.
Load an instruction dataset of choice from the Hugging Face data registry. Format it using your prompt template and create a train-test split.
Use the
SFTTrainer
fromtrl
for SFT fine-tuning (orDPOTrainer
for preference alignment).Wrap up the fine-tuning script with a
HuggingFace
SageMaker class and deploy it to EC2 instances.
Using this template, you can easily swap:
your base LLM
your dataset
your AWS infrastructure
Based on your fine-tuning scale and task needs. That's why today's AI challenges are about:
gathering raw data
cleaning your data
having enough $$$ to compute.
You can find the complete code in our LLM Engineer's Handbook GitHub repository โ ready to be adapted to your use case:
Whenever youโre ready, there are 3 ways we can help you:
Perks: Exclusive discounts on our recommended learning resources
(live courses, self-paced courses, learning platforms and books).
The LLM Engineerโs Handbook: Our bestseller book on mastering the art of engineering Large Language Models (LLMs) systems from concept to production.
Free open-source courses: Master production AI with our end-to-end open-source courses, which reflect real-world AI projects, covering everything from system architecture to data collection and deployment.
Images
If not otherwise stated, all images are created by the author.