Agents: Short vs. Long-term Memory in 2 Mins

Architecting the observability layer of an agent. Prompt engineering from zero to hero - Master the art of AI interaction.

Paul Iusztin

Jun 14, 2025

Short-term vs. long-term memory
Prompt engineering from zero to hero - Master the art of AI interaction
Architecting the observability layer of an agent

Short-term vs. long-term memory

You can’t build human-like agents without human-like memory.

But most builders skip this part entirely.

They focus on prompts, tools, and orchestration.

But forget the system that holds it all together...

Memory.

In humans, memory is layered:

Working memory for what's happening right now
Semantic memory for facts and general knowledge
Procedural memory for skills and habits
Episodic memory for lived experience

Agents are no different.

If you want believable, useful, context-aware AI…

You MUST architect memory intentionally.

Here’s a breakdown of short and long-term memory types:

Short-term memory Stores active conversation threads and recent steps. This is your context window. Lose it, and your agent resets after every turn.

For long-term memory, we have:

Semantic memory Factual world knowledge retrieved through vector search or RAG. Think: “What’s the capital of France?” or “What is stoicism?”
Procedural memory Defines what your agent knows to do, encoded directly in your code. From simple templates to complex reasoning flows—this is your logic layer.
Episodic memory Stores user-specific past interactions. It’s what enables continuity, personalization, and learning over time.

In our PhiloAgents course, we show how to wire all of this together.

Using MongoDB for structured memory
Using LangGraph (not LangChain!) to control memory flow
Using Groq for real-time LLM inference
And even using Opik (an open-source tool by Comet) to evaluate how memory shapes performance

TL;DR: A smart agent isn’t one that just thinks well... It’s one that remembers well, too.

Learn more in Lesson 3 of the PhiloAgents course ↓↓↓

Memory: The secret sauce of AI agents

Paul Iusztin

Apr 17

Read full story

Prompt Engineering from Zero to Hero - Master the Art of AI Interaction (Affiliate)

If you think you “know” prompt engineering...

Think again.

I’ve been following Nir Diamant for a while now - his GitHub repos and Substack have become go-to resources for AI practitioners.

He has a rare gift:

The ability to break down complex GenAI topics like he’s teaching a 7-year-old (without dumbing anything down).

... And now he’s done it again with a new eBook:

Prompt Engineering from Zero to Hero - Master the Art of AI Interaction

Get your copy *(**20% off** with code **PAUL**)*

This isn’t just another “use more bullet points in your prompt” kind of guide.

It’s a practical deep dive with:

Code examples
Real-world exercises
Clear explanations of common mistakes
And the subtle mechanics behind great AI interaction

Get 20% off with code PAUL

Get your copy

Architecting the observability layer of an agent

LLM systems don’t fail silently.

They fail invisibly.

No trace, no metrics, no alerts - just wrong answers and confused users.

That’s why we architected a complete observability pipeline in the Second Brain AI Assistant course.

1/ Prompt Monitoring

Tracks full prompt traces (inputs, outputs, system prompts, latencies)
Visualizes chain execution flows and step-level timing
Captures metadata like model IDs, retrieval config, prompt templates, token count, and costs

Latency metrics like:

Time to First Token (TTFT)
Tokens per Second (TPS)
Total response time

...are logged and analyzed across stages (pre-gen, gen, post-gen).

So when your agent misbehaves, you can see exactly where and why.

2/ Evaluation for Agentic RAG

Runs automated tests on the agent’s responses
Uses LLM judges + custom heuristics (hallucination, relevance, structure)
Works offline (during dev) and post-deployment (on real prod samples)
Fully CI/CD-ready with performance alerts and eval dashboards

It’s like integration testing, but for your RAG + agent stack.

The best part?

You can compare multiple versions side-by-side
Run scheduled eval jobs on live data
Catch quality regressions before your users do

This is Lesson 6 of the course (and it might be the most important one).

Because if your system can’t measure itself, it can’t improve.

Full breakdown + open-source code in the article below ↓↓↓

LLMOps for production agentic RAG

Paul Iusztin and Anca Ioana Muscalagiu

Mar 20