Evolve or perish: The new RAG paradigm
How to build your Second Brain AI assistant. End-to-end MLOps with Databricks.
This week’s topics:
How to build your Second Brain AI assistant
End-to-end MLOps with Databricks
Evolve or perish: The new RAG paradigm
How to build your Second Brain AI assistant
I’m a productivity freak and obsessed with optimizing workflows.
And recently, I’ve been exploring a new concept…
A Second Brain AI assistant.
This intelligent assistant can help you store, retrieve, and synthesize information to supercharge your productivity.
I've learned a ton about the most advanced AI techniques and practical applications, such as:
GenAI, LLMs, RAG, Agents, Information Retrieval, and LLMOps
Teaching complex topics in a way that’s easy to grasp
Compiling long-form content into digestible, practical steps
And, of course, creating visual diagrams to explain even the trickiest concepts
I’m using all this knowledge to build a Second Brain AI Assistant, and I will show you how to do the same.
All you have to do is focus on 5 core components:
1. Data Pipelines
Data is the foundation of any AI system.
We will collect data from Notion.
The goal is to crawl, clean, standardize, and store the notes and links we find in Notion in a document database like MongoDB.
The data pipelines are split into two parts:
Data collection pipeline
ETL pipeline
2. Feature Pipelines
The feature pipeline leverages the raw data collected by the data pipeline in 2 ways:
RAG feature pipeline
Dataset generation feature pipeline
3. Training Pipeline
To reduce costs, we fine-tune a summarization LLM that will be used for contextual retrieval to optimize the RAG layer and as a tool within the Agentic RAG module.
We fine-tune a Llama 3.1 8B LLM using our summarization instruct dataset.
We use Unsloth to fine-tune the LLM using QLoRA (it runs smoothly in Google Colab) and Comet to track the experiments.
After testing, the final model is uploaded to a Hugging Face registry.
Ultimately, it is deployed on Hugging Face’s serverless service for real-time inference.
We also use ZenML as an MLOps framework to manage the data, feature, and training pipelines.
4. Agentic RAG layer (inference)
This is built as an Agentic RAG layer using smolagents.
It has access to 2 tools:
A MongoDB vector search retriever to do RAG and access the data from our Second Brain
A summarization real-time endpoint to summarize answers (using our custom fine-tuned model or OpenAI)
This is connected to a Gradio UI to simulate the assistant experience.
5. Observability pipeline
You can’t manage what you can’t track and measure.
Using Opik, we:
Monitor the traces through prompt monitoring.
Measure the quality of our Second Brain AI Assistant through LLM evaluation.
Want to learn more? I’ve created an entire FREE series on this topic ↓↓↓
End-to-end MLOps with Databricks (Affiliate)
Cohort 3 of the End-to-end MLOps with Databricks live course, made by
( and ), will start on May 5th.Quoted by Maria:
“Cohort 3 will be very close to what I have always imagined it is supposed to be.”
I know Maria. She is a fantastic MLOps engineer with high-quality standards. Thus, if she said that, it means this cohort will be REALLY top-notch.
Along with teaching the fundamentals of MLOps (using the Databricks ecosystem as their tooling of choice), in this cohort, they:
Paid more attention to Python packaging.
Have a dedicated module for ML testing.
Show alternative ways to do feature engineering (the MLOps way).
This is a safe bet if you want to accelerate your MLOps learning journey.

Using the code, PAUL will get you 100 EUR off your registration.
Also, the course is 100% eligible for company reimbursement.
If you are considering enrolling or finding out more about the course, click below (remember, the next cohort starts on May 5th) ↓↓↓
Evolve or perish: The new RAG paradigm
RAG is evolving.
Standard retrieval is no longer enough.
If you’re building LLM-powered applications, you need something more powerful:
Agentic RAG.
That’s exactly what we’re tackling in Lesson 6 of the Second Brain AI Assistant course.
... and it's now LIVE!
Most RAG systems passively retrieve context and hope the LLM generates the right response.
But this approach is limited.
What if your assistant needs to reason about multiple sources?
What if retrieval alone doesn’t fully align with the user’s intent?
What if the retrieved context isn't enough, and the system needs to iterate?
Agentic RAG bridges that gap.
Instead of just fetching documents, agents dynamically plan, retrieve, and refine their responses, making AI assistants:
Smarter
More interactive
More accurate
And in Lesson 6, we’re building one from scratch...
Specifically, you’ll learn:
How to build an Agentic RAG module that goes beyond simple retrieval
Integrating retrieval with AI agents for dynamic, multi-step reasoning
Adding LLMOps, such as prompt monitoring, to optimize retrieval workflows for cost, latency, and throughput (using tools such as Opik)
Evaluating long and complex LLM chains to ensure reliability (moderation, hallucination, response performance)
Scaling retrieval architectures to handle real-world AI assistant demands
By the end of this free lesson, you’ll understand what it takes to build stable, efficient, and intelligent RAG-powered assistants ↓↓↓
Whenever you’re ready, there are 3 ways we can help you:
Perks: Exclusive discounts on our recommended learning resources
(books, live courses, self-paced courses and learning platforms).
The LLM Engineer’s Handbook: Our bestseller book on teaching you an end-to-end framework for building production-ready LLM and RAG applications, from data collection to deployment (get up to 20% off using our discount code).
Free open-source courses: Master production AI with our end-to-end open-source courses, which reflect real-world AI projects and cover everything from system architecture to data collection, training and deployment.
Images
If not otherwise stated, all images are created by the author.