How I'd add LLMOps to my GenAI app

One of the best books to start in AI engineering and GenAI. Data pipelines for AI assistants

Paul Iusztin

Mar 15, 2025

How I'd add LLMOps to my GenAI app
One of the best books to start in AI engineering and GenAI
Data pipelines for AI assistants

How I'd add LLMOps to my GenAI app

Here’s how I'd add prompt monitoring and evaluation to my GenAI app:

Observability is composed of monitoring (eyes) and evaluation (years).

Thus, we must implement two different processes:

Monitoring
Evaluation

1. Monitoring: Collect prompt traces

We need a reliable system to gather prompt traces.

Remember that most GenAI apps use complex chains containing multiple steps, such as prompts and generated answers.

Thus, we need specialized software to collect these traces and valuable metadata such as model IDs or hyperparameters.

For example, Opik (an open-source tool from Comet) integrates well with various data sources and frameworks, such as:

Custom Python functions
Frameworks such as Langchain or LlamaIndex
API providers such as OpenAI, Atrophic or Bedrock

2. Monitoring: Collect system metrics such as latency

Along with the trace itself. It’s critical to log the latency at different generation levels:

the overall request
the pre-generation step
the generation step
the post-generation step

You get this out of the box by monitoring the full trace with a tool like Opik.

3. Monitoring: Sample traces for evaluation

LLM evaluation is expensive, especially when using LLM-as-judges, so you sample only part of your monitored traces to be further piped for assessment.

For example, 30% of your traffic.

4. Evaluation: Run your LLM evaluation as an offline batch pipeline

The simplest and cheapest way to evaluate your traces sampled from production is through an offline batch pipeline that runs on a schedule, for example, every hour.

You can leverage LLM-as-judge techniques that detect hallucination or moderation issues without requiring labels/ground truth.

On samples your system failed, you can further route them to your offline evaluation set to constantly do regression tests to keep your GenAI system in check.

You can run the LLM evaluation in real-time on each sample in future steps, but that requires more engineering.

For steps 3. and 4. you can easily manage your monitoring and offline evaluation sets using a data registry, such as the Datasets abstraction provided by Opik.

5. Evaluation: Visualization and reporting

The offline LLM evaluation pipeline will generate a report on each run, where you can easily detect anomalies or visualize the status of your system.

You can move further by attaching an alarm when some thresholds are passed, such as detecting moderation issues (e.g., racism).

Want the complete step-by-step guide?

We’ve written a detailed tutorial on building a production-ready prompt monitoring pipeline (part of the LLM Twin open-source course). Check it out here:

GO TO TUTORIAL

One of the best books to start in AI engineering and GenAI

I've recently read the book Building LLMs for Production by Louis-François Bouchard and Louie Peters and loved it!

I think it's the perfect book for people getting into GenAI, LLMs and RAG, as it lays down the foundations (in a very practical way) for aspects such as:

Reducing hallucinations for LLMs
Prompt engineering
Advanced RAG (chunking, optimization, evaluation)
Agents
Fine-tuning
LLM evaluation

Enough to get a strong sense of how GenAI and LLMs work in an industry setup.

No alternative text description for this image

Most chapters are backed up by small projects/tutorials built in LlamaIndex (or LangChain), DeepLake, and OpenAI.

Here are a few tutorials that I am excited about:

Creating Knowledge Graphs from textual data
Multimodal financial document analysis from PDFs
Improving LLMs with RLHF

While reading the book, you will understand how to use LlamaIndex and LangChain to build AI solutions.

Kudos to Louis and Louie for this masterpiece!

I think it works perfectly with our LLM Engineer's Handbook. Here is how I would approach getting into the field:

Start with Building LLMs for Production to get a strong sense of the field and what problems it can solve.

Get the book (15% off as a DML reader)

Wrap up with our LLM Engineer's Handbook, which teaches you an end-to-end framework for building LLM apps using LLMOps best practices.

Get the book (20% off as a DML reader)

...and boom, you are in the game of AI engineering.

Data pipelines for AI assistants

I've created 2 courses and written a book on web crawling.

After that experience, I can hands-down admit this:

Building crawlers is hard.

But it’s a critical skill for developing LLM applications - whether you're working on RAG or fine-tuning.

That’s why I’m excited to announce that we’ve just released the 2nd lesson of the Second Brain AI open-source course (all free thanks to our sponsors).

It’s all about tackling one of the toughest challenges...

Data pipelines.

In this lesson, we’re diving deep into:

ZenML as an MLOps framework for managing ML pipelines
Reading data from Notion via its API
Crawling ~400 links using Crawl4AI
Standardizing data into Markdown
Computing quality scores using heuristics and LLMs
Storing everything in MongoDB (as our data warehouse)

This lesson will teach you how to build a scalable, production-ready AI assistant that uses RAG, LLMs, and agents.

From my own experience, crawling isn’t just about scraping the web...

It’s a super complex problem.

Why?

Because data is messy and inconsistent across sites, it's easily detected as a bot.

You have to respect the custom policies for every site you visit.

But it’s a critical skill that can make or break your LLM application.

If you’re ready to level up and build your own Second Brain AI Assistant, this is the place to start.

📍 Check out the new lesson from the Second Brain AI Assistant FREE course:

Data pipelines for AI assistants

Whenever you’re ready, there are 3 ways we can help you:

Perks: Exclusive discounts on our recommended learning resources
(books, live courses, self-paced courses and learning platforms).
The LLM Engineer’s Handbook: Our bestseller book on teaching you an end-to-end framework for building production-ready LLM and RAG applications, from data collection to deployment (get up to 20% off using our discount code).
Free open-source courses: Master production AI with our end-to-end open-source courses, which reflect real-world AI projects and cover everything from system architecture to data collection, training and deployment.

Images

If not otherwise stated, all images are created by the author.

How I'd add LLMOps to my GenAI app

One of the best books to start in AI engineering and GenAI. Data pipelines for AI assistants

This week’s topics:

How I'd add LLMOps to my GenAI app

1. Monitoring: Collect prompt traces

2. Monitoring: Collect system metrics such as latency

3. Monitoring: Sample traces for evaluation

4. Evaluation: Run your LLM evaluation as an offline batch pipeline

5. Evaluation: Visualization and reporting

One of the best books to start in AI engineering and GenAI

Data pipelines for AI assistants

Whenever you’re ready, there are 3 ways we can help you:

Images

Discussion about this post