Agents: Short vs. Long-term Memory in 2 Mins
Architecting the observability layer of an agent. Prompt engineering from zero to hero - Master the art of AI interaction.
This week’s topics:
Short-term vs. long-term memory
Prompt engineering from zero to hero - Master the art of AI interaction
Architecting the observability layer of an agent
Short-term vs. long-term memory
You can’t build human-like agents without human-like memory.
But most builders skip this part entirely.
They focus on prompts, tools, and orchestration.
But forget the system that holds it all together...
Memory.
In humans, memory is layered:
Working memory for what's happening right now
Semantic memory for facts and general knowledge
Procedural memory for skills and habits
Episodic memory for lived experience
Agents are no different.
If you want believable, useful, context-aware AI…
You MUST architect memory intentionally.
Here’s a breakdown of short and long-term memory types:
Short-term memory Stores active conversation threads and recent steps. This is your context window. Lose it, and your agent resets after every turn.
For long-term memory, we have:
Semantic memory Factual world knowledge retrieved through vector search or RAG. Think: “What’s the capital of France?” or “What is stoicism?”
Procedural memory Defines what your agent knows to do, encoded directly in your code. From simple templates to complex reasoning flows—this is your logic layer.
Episodic memory Stores user-specific past interactions. It’s what enables continuity, personalization, and learning over time.
In our PhiloAgents course, we show how to wire all of this together.
Using MongoDB for structured memory
Using LangGraph (not LangChain!) to control memory flow
Using Groq for real-time LLM inference
And even using Opik (an open-source tool by Comet) to evaluate how memory shapes performance
TL;DR: A smart agent isn’t one that just thinks well... It’s one that remembers well, too.
Learn more in Lesson 3 of the PhiloAgents course ↓↓↓
Prompt Engineering from Zero to Hero - Master the Art of AI Interaction (Affiliate)
If you think you “know” prompt engineering...
Think again.
I’ve been following Nir Diamant for a while now - his GitHub repos and Substack have become go-to resources for AI practitioners.
He has a rare gift:
The ability to break down complex GenAI topics like he’s teaching a 7-year-old (without dumbing anything down).
... And now he’s done it again with a new eBook:
Prompt Engineering from Zero to Hero - Master the Art of AI Interaction

This isn’t just another “use more bullet points in your prompt” kind of guide.
It’s a practical deep dive with:
Code examples
Real-world exercises
Clear explanations of common mistakes
And the subtle mechanics behind great AI interaction
Get 20% off with code PAUL
Architecting the observability layer of an agent
LLM systems don’t fail silently.
They fail invisibly.
No trace, no metrics, no alerts - just wrong answers and confused users.
That’s why we architected a complete observability pipeline in the Second Brain AI Assistant course.
Powered by Opik (an open-source tool from Comet), it covers two key layers:
1/ Prompt Monitoring
Tracks full prompt traces (inputs, outputs, system prompts, latencies)
Visualizes chain execution flows and step-level timing
Captures metadata like model IDs, retrieval config, prompt templates, token count, and costs
Latency metrics like:
Time to First Token (TTFT)
Tokens per Second (TPS)
Total response time
...are logged and analyzed across stages (pre-gen, gen, post-gen).
So when your agent misbehaves, you can see exactly where and why.
2/ Evaluation for Agentic RAG
Runs automated tests on the agent’s responses
Uses LLM judges + custom heuristics (hallucination, relevance, structure)
Works offline (during dev) and post-deployment (on real prod samples)
Fully CI/CD-ready with performance alerts and eval dashboards
It’s like integration testing, but for your RAG + agent stack.
The best part?
You can compare multiple versions side-by-side
Run scheduled eval jobs on live data
Catch quality regressions before your users do
This is Lesson 6 of the course (and it might be the most important one).
Because if your system can’t measure itself, it can’t improve.
Full breakdown + open-source code in the article below ↓↓↓
Whenever you’re ready, there are 3 ways we can help you:
Perks: Exclusive discounts on our recommended learning resources
(books, live courses, self-paced courses and learning platforms).
The LLM Engineer’s Handbook: Our bestseller book on teaching you an end-to-end framework for building production-ready LLM and RAG applications, from data collection to deployment (get up to 20% off using our discount code).
Free open-source courses: Master production AI with our end-to-end open-source courses, which reflect real-world AI projects and cover everything from system architecture to data collection, training and deployment.
Images
If not otherwise stated, all images are created by the author.