Here's the uncomfortable truth about most AI agents: they have amnesia. Every conversation starts from zero. Every session forgets the last. Your agent might be brilliant at reasoning, but if it can't remember what happened 10 minutes ago, it's useless for anything beyond one-shot tasks.
Memory is what separates a toy demo from a production agent. In this guide, we'll break down the different types of AI agent memory, how they work under the hood, which tools to use, and how to build agents that actually remember.
Without memory, an AI agent is like a contractor who shows up every morning having forgotten everything about your project. You'd have to re-explain the architecture, the decisions you've made, and the problems you've already solved. Every. Single. Day.
Memory enables agents to:
Not all memory is created equal. AI agents use different memory systems for different purposes, just like humans use working memory, episodic memory, and procedural memory differently.
This is the LLM's "RAM" — the conversation context that the model can see right now. Every message, tool result, and system prompt lives here until the context window fills up.
| Model | Context Window | Effective Limit |
|---|---|---|
| GPT-4o | 128K tokens | ~80-100K usable |
| Claude Opus 4 | 200K tokens | ~150K usable |
| Gemini 2.5 Pro | 1M tokens | ~700K usable |
| DeepSeek V3 | 128K tokens | ~90K usable |
Limitations: Context windows are expensive (you pay per token), have hard ceilings, and degrade in quality as they fill — models perform worse with very long contexts ("lost in the middle" problem).
Best for: Current task context, recent conversation history, active instructions.
This bridges individual messages within a session. Most chat interfaces handle this automatically by sending the full conversation history with each API call. For agents, you manage this explicitly.
# Simple conversation memory with sliding window
class ConversationMemory:
def __init__(self, max_messages=50):
self.messages = []
self.max_messages = max_messages
def add(self, role, content):
self.messages.append({"role": role, "content": content})
# Keep only recent messages + system prompt
if len(self.messages) > self.max_messages:
system = [m for m in self.messages if m["role"] == "system"]
recent = self.messages[-self.max_messages:]
self.messages = system + recent
def get_context(self):
return self.messages
Best for: Multi-turn conversations, task continuity within a session.
This is where it gets interesting. Long-term memory persists between sessions — when the agent "wakes up" tomorrow, it remembers what happened today. There are several approaches:
File-based memory — The simplest approach. Write important information to files, read them at the start of each session.
# File-based persistent memory (what Pax uses)
import json
from pathlib import Path
class FileMemory:
def __init__(self, memory_dir="memory/"):
self.dir = Path(memory_dir)
self.dir.mkdir(exist_ok=True)
def save(self, key, data, category="general"):
path = self.dir / f"{category}_{key}.json"
path.write_text(json.dumps({
"key": key,
"category": category,
"data": data,
"saved_at": datetime.now().isoformat()
}, indent=2))
def load(self, key, category="general"):
path = self.dir / f"{category}_{key}.json"
if path.exists():
return json.loads(path.read_text())["data"]
return None
def search(self, query):
"""Simple keyword search across all memories"""
results = []
for path in self.dir.glob("*.json"):
content = path.read_text()
if query.lower() in content.lower():
results.append(json.loads(content))
return results
Vector database memory — For agents that need semantic search over large memory stores. Store embeddings of past interactions, retrieve relevant memories based on similarity.
# Vector-based memory with ChromaDB
import chromadb
class VectorMemory:
def __init__(self):
self.client = chromadb.PersistentClient(path="./agent_memory")
self.collection = self.client.get_or_create_collection(
name="agent_memories",
metadata={"hnsw:space": "cosine"}
)
def store(self, text, metadata=None):
self.collection.add(
documents=[text],
ids=[f"mem_{datetime.now().timestamp()}"],
metadatas=[metadata or {}]
)
def recall(self, query, n_results=5):
results = self.collection.query(
query_texts=[query],
n_results=n_results
)
return results["documents"][0]
Database memory — For structured data that needs ACID guarantees: user preferences, task history, financial records.
Best for: Cross-session continuity, learning from past experiences, building knowledge bases.
Episodic memory stores complete "episodes" — full sequences of actions and outcomes. This lets agents learn from past successes and failures. Think of it as a decision journal.
# Episodic memory for learning from past tasks
class EpisodicMemory:
def __init__(self, store):
self.store = store
def record_episode(self, task, actions, outcome, lessons):
episode = {
"task": task,
"actions": actions,
"outcome": outcome, # "success" | "failure" | "partial"
"lessons": lessons,
"timestamp": datetime.now().isoformat()
}
self.store.save(
key=f"episode_{hash(task)}",
data=episode,
category="episodes"
)
def recall_similar(self, current_task):
"""Find past episodes similar to the current task"""
episodes = self.store.search(current_task)
# Prioritize successful episodes
return sorted(episodes,
key=lambda e: e["data"]["outcome"] == "success",
reverse=True
)
Best for: Improving agent performance over time, avoiding repeated mistakes, task planning based on past experience.
In production, you combine multiple memory types. Here are the most common patterns:
Like a CPU cache hierarchy: fast/small working memory at the top, slow/large persistent memory at the bottom. The agent promotes frequently-accessed memories and demotes stale ones.
Working Memory (context window)
↑↓ promote/demote
Short-Term Cache (recent 50 interactions)
↑↓ consolidate/retrieve
Long-Term Store (vector DB + files)
↑↓ archive/search
Archive (compressed historical data)
The most popular pattern in 2026. Instead of stuffing everything into the context window, store memories externally and retrieve only what's relevant for the current task.
# RAG memory pipeline
def build_context(query, memory_store, max_tokens=4000):
# 1. Retrieve relevant memories
relevant = memory_store.recall(query, n_results=10)
# 2. Rank by relevance + recency
scored = []
for mem in relevant:
relevance = mem["similarity_score"]
recency = time_decay(mem["timestamp"]) # exponential decay
score = 0.7 * relevance + 0.3 * recency
scored.append((score, mem))
# 3. Pack into context budget
context_parts = []
token_count = 0
for score, mem in sorted(scored, reverse=True):
mem_tokens = count_tokens(mem["text"])
if token_count + mem_tokens > max_tokens:
break
context_parts.append(mem["text"])
token_count += mem_tokens
return "\n---\n".join(context_parts)
For complex domains, organize memories as entities and relationships rather than flat text. This enables reasoning over connections.
# Knowledge graph memory
{
"entities": {
"user_123": {"type": "user", "prefs": {"lang": "en", "tone": "casual"}},
"project_abc": {"type": "project", "status": "active", "stack": "Next.js"},
"bug_456": {"type": "issue", "severity": "high", "status": "fixed"}
},
"relations": [
{"from": "user_123", "to": "project_abc", "type": "owns"},
{"from": "bug_456", "to": "project_abc", "type": "affects"},
{"from": "user_123", "to": "bug_456", "type": "reported"}
]
}
| Tool | Type | Best For | Cost |
|---|---|---|---|
| ChromaDB | Vector DB (local) | Small-to-mid agents, local dev | Free / open-source |
| Pinecone | Vector DB (cloud) | Production scale, managed | Free tier, then $70+/mo |
| Weaviate | Vector DB (hybrid) | Hybrid search (keyword + vector) | Free OSS, cloud from $25/mo |
| Qdrant | Vector DB | High-performance, Rust-based | Free OSS, cloud from $25/mo |
| SQLite + FTS5 | Relational + fulltext | Structured data, simple keyword search | Free |
| Mem0 | Memory layer | Drop-in agent memory, auto-categorization | Free tier, then $49/mo |
| Plain files (JSON/MD) | File system | Simple agents, human-readable | Free |
Storing everything is tempting but counterproductive. An agent drowning in memories performs worse, not better. Be selective: save decisions, lessons, and key facts — not raw logs.
# Bad: storing raw conversation
memory.save("conv_12345", entire_conversation_transcript)
# Good: extracting and storing the lesson
memory.save("lesson_api_retry", {
"context": "Beehiiv API returns 429 during peak hours",
"solution": "Retry with exponential backoff, max 3 attempts",
"learned_from": "newsletter pipeline failure on 2026-03-15"
})
Memories from 6 months ago might be wrong today. Code changes, APIs update, preferences shift. Implement decay or validation:
Injecting too many memories into the prompt wastes tokens and confuses the model. Budget your memory injection:
Without cleanup, memory stores accumulate contradictions. If your agent learned "use API v1" in January and "use API v2" in March, both memories exist. Implement conflict resolution:
Here's a practical implementation for a production agent:
# What categories of information does your agent need to remember?
MEMORY_TYPES = {
"user": "Who is the user, their preferences and context",
"project": "Active projects, goals, deadlines",
"feedback": "What to do/not do based on past corrections",
"reference": "Where to find external information",
"episode": "Past task attempts and their outcomes"
}
import json, os
from datetime import datetime
def save_memory(memory_dir, name, content, mem_type, description):
filepath = os.path.join(memory_dir, f"{name}.md")
with open(filepath, "w") as f:
f.write(f"---\n")
f.write(f"name: {name}\n")
f.write(f"description: {description}\n")
f.write(f"type: {mem_type}\n")
f.write(f"updated: {datetime.now().isoformat()}\n")
f.write(f"---\n\n")
f.write(content)
# Keep a lightweight index for fast lookup
# Load at session start, search without reading every file
def build_index(memory_dir):
index = []
for f in os.listdir(memory_dir):
if f.endswith(".md") and f != "INDEX.md":
path = os.path.join(memory_dir, f)
with open(path) as fh:
# Parse frontmatter
lines = fh.readlines()
meta = {}
for line in lines[1:]:
if line.strip() == "---":
break
key, _, val = line.partition(":")
meta[key.strip()] = val.strip()
index.append({"file": f, **meta})
return index
def retrieve_relevant(index, task_description, max_results=5):
"""Score memories by relevance to current task"""
scores = []
for entry in index:
# Simple keyword overlap scoring
desc_words = set(entry.get("description", "").lower().split())
task_words = set(task_description.lower().split())
overlap = len(desc_words & task_words)
# Recency bonus
days_old = (datetime.now() -
datetime.fromisoformat(entry.get("updated", "2020-01-01"))
).days
recency = max(0, 1 - days_old / 90) # decay over 90 days
score = overlap * 2 + recency
scores.append((score, entry))
return sorted(scores, reverse=True)[:max_results]
def build_system_prompt(base_prompt, memory_dir, current_task):
index = build_index(memory_dir)
relevant = retrieve_relevant(index, current_task)
memory_context = "\n\n## Relevant Memories\n"
for score, entry in relevant:
filepath = os.path.join(memory_dir, entry["file"])
with open(filepath) as f:
content = f.read()
memory_context += f"\n### {entry.get('name', entry['file'])}\n"
memory_context += content + "\n"
return base_prompt + memory_context
Most frameworks now include memory primitives:
ConversationBufferMemory, ConversationSummaryMemory, VectorStoreRetrieverMemory. Rich ecosystem but can be over-abstracted.Teachable agents that learn from feedback. Stores lessons in a vector DB automatically.| Scenario | Memory Type | Implementation |
|---|---|---|
| Chatbot remembers user preferences | Long-term (structured) | SQLite or JSON files |
| Agent searches past conversations | Long-term (semantic) | Vector DB (Chroma, Qdrant) |
| Multi-step task tracking | Working + short-term | Context window + conversation history |
| Learning from past mistakes | Episodic | Structured logs + retrieval |
| 24/7 autonomous agent | All four types | Files + daily notes + vector DB |
| Customer support bot | Short-term + long-term | Session history + customer profile DB |
Our AI Agent Playbook includes complete memory system templates, SOUL.md examples, and production patterns for persistent agents.
Get the Playbook — $29Get the latest on agent memory, frameworks, and autonomous systems. 3x/week, no spam.
Subscribe to AI Agents Weekly