1. Why Agent Memory is the Missing Layer
Every LLM has a context window. Once a conversation ends, the memory is gone. For one-off queries that's fine — for agents that work with users over days, weeks, or months, it's a dealbreaker.
Consider these use cases that require persistent memory:
- A coding assistant that remembers your project structure, preferred patterns, and past bugs
- A customer support agent that knows a user's order history, past complaints, and preferences
- A personal assistant that learns your habits, calendar patterns, and communication style
- A research agent that accumulates findings across multiple sessions
The solution: a dedicated memory layer that compresses, indexes, and retrieves the right context at the right time — independent of the context window.
2. Four Types of Agent Memory
| Type | What It Stores | Analogy | Implementation |
|---|---|---|---|
| In-context (buffer) | Recent conversation turns | Working memory | Last N messages in prompt |
| Episodic | Past events & interactions | Diary | Vector DB + timestamp |
| Semantic | Facts, preferences, knowledge | Personal encyclopedia | Knowledge graph / vector DB |
| Procedural | Skills, workflows, instructions | Muscle memory | System prompt + fine-tuning |
Production agents need all four layers. Let's look at the best tools for each.
3. Mem0 — Drop-in Persistent Memory
⭐ Mem0 in One Sentence
An intelligent memory layer that automatically extracts, stores, and retrieves relevant memories from conversations — with a 3-line integration.
Mem0 (formerly mem0ai) is the most popular open-source memory solution for LLM agents in 2026. It uses an LLM to automatically identify what's worth remembering and stores facts as structured memories in a vector database.
Quick start
pip install mem0ai
from mem0 import Memory
from openai import OpenAI
# Initialize Mem0 (uses local vector store by default)
mem = Memory()
# Or with custom config (production)
config = {
"llm": {
"provider": "openai",
"config": {
"model": "deepseek-chat",
"api_key": "your-key",
"base_url": "https://api.deepseek.com"
}
},
"vector_store": {
"provider": "qdrant",
"config": {"host": "localhost", "port": 6333}
}
}
mem = Memory.from_config(config)
# Add memories from a conversation
messages = [
{"role": "user", "content": "I prefer Python over JavaScript, and I'm building a RAG app"},
{"role": "assistant", "content": "Got it! I'll use Python examples for your RAG app."}
]
mem.add(messages, user_id="user_123")
# Mem0 automatically extracts: "Prefers Python" + "Building a RAG app"
# Retrieve relevant memories
memories = mem.search("What language should I use?", user_id="user_123")
for m in memories:
print(f"[{m['score']:.2f}] {m['memory']}")
# → [0.95] User prefers Python over JavaScript
# → [0.87] User is building a RAG application
Integrating Mem0 into an agent
from openai import OpenAI
from mem0 import Memory
client = OpenAI(api_key="your-key", base_url="https://api.deepseek.com")
mem = Memory()
def chat_with_memory(user_message: str, user_id: str) -> str:
# 1. Retrieve relevant memories
relevant = mem.search(user_message, user_id=user_id, limit=5)
memory_context = "\n".join([f"- {m['memory']}" for m in relevant])
# 2. Build prompt with memory context
system = f"""You are a helpful personal assistant.
What you know about this user:
{memory_context if memory_context else 'Nothing yet.'}
Use this context to personalize your responses."""
# 3. Get LLM response
response = client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "system", "content": system},
{"role": "user", "content": user_message}
]
)
answer = response.choices[0].message.content
# 4. Store new memories from this exchange
mem.add([
{"role": "user", "content": user_message},
{"role": "assistant", "content": answer}
], user_id=user_id)
return answer
# Session 1
print(chat_with_memory("I'm allergic to peanuts", user_id="alice"))
# Session 2 (days later — memory persists)
print(chat_with_memory("Suggest a snack for me", user_id="alice"))
# → "Since you're allergic to peanuts, I'd suggest..."
4. Zep — Temporal Knowledge Graph
🔵 Zep in One Sentence
A knowledge graph-based memory that tracks entities, relationships, and how facts change over time — ideal for enterprise applications needing structured recall.
Zep goes beyond simple vector search. It builds a temporal knowledge graph of entities (people, products, organizations) and their relationships, automatically handling contradiction and fact updates ("User used to work at Google, now works at Anthropic").
Zep integration
pip install zep-python
from zep_python import ZepClient
from zep_python.memory import Memory, Message, Session
client = ZepClient(api_key="your-zep-key") # or self-hosted
# Create a session
session_id = "user_alice_session_001"
client.memory.add_session(Session(session_id=session_id,
metadata={"user_id": "alice"}))
# Add messages to memory
messages = [
Message(role="human", content="I just got promoted to Senior Engineer at Stripe"),
Message(role="ai", content="Congratulations on your promotion at Stripe!")
]
client.memory.add_memory(session_id, Memory(messages=messages))
# Search memory (semantic + temporal)
results = client.memory.search_memory(session_id, "What does Alice do for work?")
for r in results.results:
print(f"[{r.dist:.2f}] {r.message.content}")
# Get entity facts (knowledge graph)
facts = client.memory.get_session_facts(session_id)
for fact in facts.facts:
print(f"Fact: {fact.fact} | Valid: {fact.valid_at} → {fact.invalid_at or 'now'}")
Zep knowledge graph query
# Query entities across all sessions for a user
graph_results = client.graph.search(
user_id="alice",
query="job title and employer",
scope="edges" # relationships between entities
)
for edge in graph_results.edges:
print(f"{edge.source_node_name} --[{edge.fact}]--> {edge.target_node_name}")
5. Letta (MemGPT) — Stateful Agents with OS-like Memory
🟢 Letta in One Sentence
An agent framework with an OS-inspired memory architecture — in-context working memory, archival storage, and memory tools that agents use autonomously.
Letta (previously MemGPT) treats memory like an operating system: the agent has a limited "RAM" (context window) and an unlimited "disk" (archival storage). The agent itself decides when to page memories in and out using built-in memory tools.
pip install letta-client
from letta_client import Letta
# Connect to Letta server (local or cloud)
client = Letta(token="your-token", base_url="https://app.letta.com")
# Create a stateful agent with memory
agent = client.agents.create(
name="personal_assistant",
memory_blocks=[
{
"label": "human",
"value": "Name: Alice\nOccupation: Senior Engineer at Stripe",
"limit": 2000
},
{
"label": "persona",
"value": "You are a helpful personal assistant who remembers everything about the user.",
"limit": 1000
}
],
model="deepseek-chat",
embedding="openai/text-embedding-3-small"
)
# Chat — Letta manages all memory automatically
response = client.agents.messages.create(
agent_id=agent.id,
messages=[{"role": "user", "content": "What do you know about me?"}]
)
for msg in response.messages:
if msg.message_type == "assistant_message":
print(msg.content)
# → "You're Alice, a Senior Engineer at Stripe..."
# The agent autonomously archives old memories and retrieves relevant ones
# It uses tools like: core_memory_append, archival_memory_insert, archival_memory_search
6. DIY Memory with Redis + pgvector
🟣 When to Go DIY
Full control over memory structure, storage, and retrieval. Best when you have specific compliance requirements or want to minimize dependencies.
import json, time
import psycopg2
import redis
from openai import OpenAI
client = OpenAI(api_key="your-key", base_url="https://api.deepseek.com")
# Redis for fast short-term memory (recent N turns)
r = redis.Redis(host="localhost", port=6379, decode_responses=True)
# pgvector for long-term semantic memory
conn = psycopg2.connect("postgresql://user:pass@localhost/agentdb")
cur = conn.cursor()
# Setup (run once)
cur.execute("""
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE IF NOT EXISTS agent_memories (
id SERIAL PRIMARY KEY,
user_id VARCHAR(255),
content TEXT,
embedding vector(1536),
created_at TIMESTAMP DEFAULT NOW(),
memory_type VARCHAR(50) -- 'fact' | 'episode' | 'preference'
);
CREATE INDEX ON agent_memories USING ivfflat (embedding vector_cosine_ops);
""")
conn.commit()
def embed(text: str) -> list[float]:
resp = client.embeddings.create(input=text, model="text-embedding-3-small")
return resp.data[0].embedding
def store_memory(user_id: str, content: str, memory_type: str = "episode"):
emb = embed(content)
cur.execute(
"INSERT INTO agent_memories (user_id, content, embedding, memory_type) VALUES (%s, %s, %s, %s)",
(user_id, content, emb, memory_type)
)
conn.commit()
def retrieve_memories(user_id: str, query: str, top_k: int = 5) -> list[str]:
query_emb = embed(query)
cur.execute("""
SELECT content, 1 - (embedding <=> %s::vector) AS similarity
FROM agent_memories
WHERE user_id = %s
ORDER BY embedding <=> %s::vector
LIMIT %s
""", (query_emb, user_id, query_emb, top_k))
return [row[0] for row in cur.fetchall()]
def add_to_buffer(user_id: str, role: str, content: str, max_turns: int = 10):
key = f"buffer:{user_id}"
r.rpush(key, json.dumps({"role": role, "content": content, "ts": time.time()}))
r.ltrim(key, -max_turns * 2, -1) # keep last N turns
def get_buffer(user_id: str) -> list[dict]:
key = f"buffer:{user_id}"
return [json.loads(m) for m in r.lrange(key, 0, -1)]
# Full agent with layered memory
def agent_reply(user_id: str, user_message: str) -> str:
# Layer 1: Recent buffer (short-term)
buffer = get_buffer(user_id)
buffer_text = "\n".join([f"{m['role']}: {m['content']}" for m in buffer[-6:]])
# Layer 2: Relevant long-term memories (semantic)
long_term = retrieve_memories(user_id, user_message)
lt_text = "\n".join([f"- {m}" for m in long_term])
system = f"""You are a helpful assistant with memory.
Recent conversation:
{buffer_text}
Long-term memories about this user:
{lt_text if lt_text else 'None yet.'}"""
resp = client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "system", "content": system},
{"role": "user", "content": user_message}
]
)
answer = resp.choices[0].message.content
# Store to both layers
add_to_buffer(user_id, "user", user_message)
add_to_buffer(user_id, "assistant", answer)
store_memory(user_id, f"User said: {user_message}", "episode")
return answer
7. Full Comparison: Mem0 vs Zep vs Letta vs DIY
| Dimension | Mem0 | Zep | Letta | DIY |
|---|---|---|---|---|
| Setup complexity | ⭐ (3 lines) | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| Memory type | Facts/Semantic | Knowledge Graph | OS-inspired layers | Custom |
| Temporal tracking | Limited | ✅ Excellent | Limited | Custom |
| Auto extraction | ✅ LLM-based | ✅ LLM-based | ✅ Agent-driven | ❌ Manual |
| Self-hosted | ✅ | ✅ | ✅ | ✅ |
| Multi-user | ✅ | ✅ | ✅ | Custom |
| LangChain integration | ✅ | ✅ | Partial | Custom |
| Production readiness | High | High | Medium | Depends |
| Best for | Quick start, chatbots | Enterprise, structured data | Long-running agents | Compliance/control |
| Pricing | Open-source + Cloud | Open-source + Cloud | Open-source + Cloud | Infra cost only |
8. Production Memory Architecture Pattern
Here's the memory stack we recommend for a production AI assistant in 2026:
┌────────────────────────────────────────────────────────┐
│ USER MESSAGE │
└──────────────────────────┬─────────────────────────────┘
│
┌────────────▼────────────┐
│ Layer 1: Buffer │ ← Redis (last 10 turns, ~1ms)
│ (Working Memory) │
└────────────┬────────────┘
│
┌────────────▼────────────┐
│ Layer 2: Semantic │ ← Mem0 / pgvector (top-5 facts, ~50ms)
│ (Long-term Memory) │
└────────────┬────────────┘
│
┌────────────▼────────────┐
│ Layer 3: Knowledge │ ← Zep (entity relationships, ~100ms)
│ (Structured Facts) │ Only for enterprise use cases
└────────────┬────────────┘
│
┌────────────▼────────────┐
│ LLM (DeepSeek V4) │ ← Assembled context → response
└────────────┬────────────┘
│
┌────────────▼────────────┐
│ Memory Update │ ← Async background task
│ (Extract + Store) │ Don't block the response
└─────────────────────────┘
- Start with Mem0 — 3-line integration, works well for 90% of use cases
- Add Zep when you need entity tracking or fact contradiction handling
- Use Letta when your agent runs autonomously over days/weeks
- Always async memory writes — don't let storage latency affect response time
- Memory TTL — set expiry on episodic memories (1 year), keep semantic forever
Quick Decision: Which to Choose?
| Use Case | Recommendation |
|---|---|
| Personal assistant / chatbot | Mem0 (managed cloud) |
| Enterprise CRM / support | Zep + self-hosted |
| Long-running autonomous agent | Letta |
| Compliance-heavy (GDPR/HIPAA) | DIY + pgvector |
| Prototype / hackathon | Mem0 OSS, local Chroma |
| Multi-agent system | Mem0 (shared memory layer) |
Browse all memory tools, vector databases, and agent frameworks at AgDex.ai — 420+ AI agent tools organized by category.