Lambda architecture’s comeback for AI systems
How to become a top 1% AI/ML Engineer. The evolution of RAG, semantic search, and retrieval systems
This week’s topics:
Lambda architecture’s comeback for AI systems
How to become a top 1% AI/ML Engineer
The evolution of RAG, semantic search, and retrieval systems
Lambda architecture’s comeback for AI systems
Ever wondered how to unify your batch and streaming feature processing pipelines while keeping data science, data engineering, and MLE teams on the same page?
Let’s explore how a well-implemented Lambda architecture can seamlessly bring these teams together under a single codebase — without code duplication and training-serving skew.
There is a continuous discussion about using Lambda or Kappa architectures to build your big data infrastructure and serve features (offline and online) for AI models at scale.
While the Kappa architecture is often praised, with the correct tooling, the Lambda design can give you the best of all worlds:
Flexibility
Robustness
Ease of use
Before digging more into that, let’s have a quick reminder on these architectures:
Lambda: Complex system that uses separate batch and stream processing systems. It can handle any load of data. Some big cons are that it’s complex to set up and maintain. Often, you end up with duplicated code where you can end up with different results from batch and stream processing engines.
Kappa: It simplifies the pipeline with a single stream processing system, treating all data as streams. Setting up and maintaining is simpler but can still introduce complexity to businesses unfamiliar with stream processing frameworks such as Apache Flink. Another con is that it’s more costly due to its heavy reliance on streaming infrastructure.
Why Lambda can still shine
Using a Lambda architecture is cheaper and more flexible. Also, using it you can leverage batch for high throughput operations offline and stream for low latency operations online.
However, in the AI world, the Lambda architecture is often criticized for duplicating the code between batch and streaming, as it can introduce nasty training-serving skew issues.
But with modern feature stores and platforms, that’s not true anymore.
For example, you no longer have these issues using a feature platform like Tecton, which offers a Python SDK that lets you unify your streaming and batch code into a single Python module.
Eliminating all the cons of the Lambda architecture, as it abstracts away the complexity through their engine (Rift) and eliminates the code duplication issue through their Python SDK.
Example
Let’s see how this works in practice computing features for predicting Bitcoin prices requiring batch and stream data sources.
First, we define a batch and stream source for getting raw Bitcoin prices using Tecton’s StreamSource class, which supports both. The data sources can come from any external storage, such as an API, Snowflake, BigQuery, S3, DataBricks, etc:
from tecton import BatchSource, StreamSource
stream_btc_prices = StreamSource(
name="btc_prices",
stream_config=PushConfig(),
batch_config=FileConfig(
uri="s3://some_url/btc_prices.pq",
file_format="parquet",
timestamp_field="timestamp",
),
schema=[Field("user_id", String), Field("timestamp", Timestamp), Field("price", Float64)],
)
Next, we define a SINGLE stream feature view for computing price-moving averages, which supports both batch for offline training operations with high throughput and streaming for computing the same features in online mode with low latency for inference:
from tecton import stream_feature_view
… # other imports
import my_awesome_feature
@stream_feature_view(
description="Bitcoin prices over 1, 3 and 7 days",
source=stream_btc_prices,
mode="pandas",
online=True,
offline=True,
aggregation_interval=timedelta(days=1),
timestamp_field="timestamp",
features=[
Aggregate(input_column=Field("price", Float64), function="mean", time_window=timedelta(days=1)),
Aggregate(input_column=Field("price", Float64), function="mean", time_window=timedelta(days=3)),
Aggregate(input_column=Field("price", Float64), function="mean", time_window=timedelta(days=7)),
],
batch_schedule=timedelta(days=1)
)
def prices_features(prices_stream):
return my_awesome_feature.compute(prices_stream)
The key here is that by leveraging a single feature view defined in Python, we can do both batch and streaming, eliminating potential code duplications down the line. Note how we added batch-specific parameters, such as `aggregation_interval` and `batch_schedule,` to the feature view.
You can write the features view code in raw Python, Pandas, SQL, or Spark.
By computing the features using standard Python code, we also avoid using streaming processing frameworks that may have additional engineering overhead.
This behavior is possible because of Tecton’s Rift engine, which handles different data sources and unifies them under their Python SDK. We can even replicate the same behavior for real-time features computed on demand during inference.
Now, with a single CLI command, we can materialize our features into the serving layer, more concretely, into the offline store for training in batches (with high throughput storing historical data) and the online store (with low latency storing only the latest value).
tecton apply
Now, we can easily train our price prediction model on top of features computed on Bitcoin prices and deploy it at low latencies to predict prices in real-time.
Coinbase uses a similar strategy to implement its AI models for predicting crypto movements using Tecton’s Rift engine and Python SDK.
Funny note: For people familiar with infrastructure as code (IaC) tools, note how Tecton treats features as code, where every modification to your feature set follows a similar strategy to Terraform or Pulumi, tracking additions, deletions and updates.
To conclude, the Lambda architecture remains a powerful option for ML systems—especially when combined with a modern feature platform like Tecton. By unifying batch and streaming code, you can enjoy flexibility, robustness, and ease of use without the usual drawbacks of duplicated logic.
If you’d like to try out the code, check out Tecton’s series on building a real-time fraud detection model ↓
How to become a top 1% AI/ML Engineer (Affiliate)
Did you know that becoming a top 1% AI/ML engineer has less to do with mastering ML frameworks and more with prioritizing your software engineering skills?
Here’s why:
AI/ML systems don’t just rely on cool algorithms — they thrive on robust, scalable code. Without strong software engineering fundamentals, your AI/ML models will struggle in production.
To build a foundation for success, focus on these core skills:
Writing clean, modular code
Designing efficient cloud architectures
Mastering programming languages like Python (and soon, Rust)
These software engineering skills are essential for creating AI/ML systems that work seamlessly in real-world scenarios.
So, how can you level up your skills effectively?
I’ve been exploring CodeCrafters.io, a platform tailored for developers who want to sharpen their software engineering expertise. It offers hands-on challenges where you build real-world tools from scratch, such as:
Docker
Git
Redis
Kafka
Shell
By working through these challenges, you’ll learn how to write production-grade code and gain practical experience in the technologies powering tomorrow's AI/ML systems.
Key takeaway:
Mastering software engineering is the foundation for unlocking the full potential of your AI/ML models.

Ready to accelerate your journey toward becoming a top 1% AI/ML engineer?
If so, use my affiliate link for a 40% discount on CodeCrafters.io ↓
The evolution of RAG, semantic search, and retrieval systems
The future of retrieval systems is taking an interesting turn. While this perspective might challenge some conventional thinking, it's worth exploring.
The key lies in multi-attribute vector indexes combining tabular and unstructured data (text, images, audio).
Why this matters
The reasoning is straightforward: most applications primarily utilize tabular data rather than unstructured data for search, analytics, and retrieval systems.
This reality suggests that incorporating tabular data into semantic search represents the next evolutionary step - and no, this isn't about text-to-SQL, which would only add unnecessary complexity to existing systems.
Building a multi-attribute vector index: A practical example
Let's examine how to implement this for Amazon e-commerce products, using Superlinked and MongoDB Atlas Vector Search as our tools:
1. Process the data
Skip chunking.
Clean up columns as you would in data science.
Handle basic data preparation (convert review counts to integers, address NaNs, format reviews).
Consider each table row as a node in your vector space.
2. Build the multi-attribute index
Define your schema using various variable types (product category, review count, value, description).
Establish vector similarity spaces connecting product description, price, and review rating attributes.
3. Query the index
Customize searches by applying weights to semantic search between attributes.
Optimize results by prioritizing specific factors (e.g., low prices or high reviews)
Implementation insights
Using Superlinked, you can fine-tune searches through weighted attributes. This approach prioritizes different data aspects during retrieval while utilizing the same vector index, making experimentation quicker and implementation more cost-effective.
The detailed implementation strategy deserves its own discussion, but this framework provides a solid foundation for understanding the direction of retrieval systems in 2025.
For the full implementation, consider checking out our series on building an MVP for searching Amazon products using tabular semantic search:
Whenever you’re ready, there are 3 ways we can help you:
Perks: Exclusive discounts on our recommended learning resources
(live courses, self-paced courses, learning platforms and books).
The LLM Engineer’s Handbook: Our bestseller book on mastering the art of engineering Large Language Models (LLMs) systems from concept to production.
Free open-source courses: Master production AI with our end-to-end open-source courses, which reflect real-world AI projects, covering everything from system architecture to data collection and deployment.
Images
If not otherwise stated, all images are created by the author.
I have deployed all my LLM backed applications via lambdas since day one. Nothing else makes sense.