Agent Memory Systems: How to Build AI Agents That Learn from Conversations
Implementation guide for agent memory -session management, long-term storage, context windows, and memory architectures for agents that remember past interactions.
Implementation guide for agent memory -session management, long-term storage, context windows, and memory architectures for agents that remember past interactions.
TL;DR
User first conversation:
User: "I prefer communications by email, not phone."
Agent: "Got it, I'll note that."
User second conversation (next day):
User: "Can you contact me about this issue?"
Agent: "Sure! What's the best way to reach you -email or phone?"
User: 😡 "I told you yesterday, email only!"
Problem: Agent forgot. Users expect agents to remember context, preferences, past interactions.
Here's how to build memory into agents.
What: Recent conversation history (last 3-10 turns).
Duration: Current session only.
Use: Maintain coherent conversation flow.
Example:
User: "What's the weather in London?"
Agent: "It's 15°C and cloudy."
User: "What about tomorrow?"
Agent: [Knows "What about tomorrow" = weather in London tomorrow]
Implementation: Simple buffer (keep last N messages).
What: Persistent facts about user (preferences, history, profile).
Duration: Across sessions (days, months, years).
Use: Personalization, continuity across conversations.
Example:
Session 1: User shares preference for email
Session 2 (next week): Agent remembers, uses email without asking
Implementation: Database storage (SQL, NoSQL, vector DB).
What: External knowledge retrieved on-demand (RAG).
Duration: Per-query (not stored in conversation).
Use: Answer questions using knowledge base without fine-tuning.
Example:
User: "What's our return policy?"
Agent: [Retrieves policy from knowledge base, doesn't memorize it]
Implementation: Vector database + retrieval. Covered in our RAG guide.
This guide focuses on Short-Term and Long-Term memory.
Keep last N messages in context window.
class BufferMemory:
def __init__(self, max_messages=10):
self.messages = []
self.max_messages = max_messages
def add_message(self, role, content):
self.messages.append({"role": role, "content": content})
if len(self.messages) > self.max_messages:
self.messages.pop(0) # Remove oldest
def get_context(self):
return self.messages
# Usage
memory = BufferMemory(max_messages=6) # Last 3 turns (6 messages)
memory.add_message("user", "What's the weather?")
memory.add_message("assistant", "It's sunny, 22°C.")
memory.add_message("user", "What about tomorrow?")
# Agent sees: All 3 messages for context
context = memory.get_context()
Pros:
Cons:
Use when: Conversations <10 turns, <2K tokens total.
Summarize old conversation, keep recent messages verbatim.
class SummaryMemory:
def __init__(self, recent_k=4, summarize_threshold=10):
self.messages = []
self.summary = None
self.recent_k = recent_k
self.summarize_threshold = summarize_threshold
def add_message(self, role, content):
self.messages.append({"role": role, "content": content})
if len(self.messages) > self.summarize_threshold:
self._summarize_old_messages()
def _summarize_old_messages(self):
old_messages = self.messages[:-self.recent_k]
# Use cheap model to summarize
summary_prompt = f"Summarize this conversation:\n{old_messages}"
self.summary = call_llm(summary_prompt, model="gpt-3.5-turbo")
# Keep only recent messages
self.messages = self.messages[-self.recent_k:]
def get_context(self):
context = []
if self.summary:
context.append({"role": "system", "content": f"Summary of earlier conversation: {self.summary}"})
context.extend(self.messages)
return context
Example:
After 12 messages:
Summary: "User asked about product features. Agent explained A, B, C. User expressed interest in B."
Recent messages:
User: "What's the price for B?"
Agent: "$99/month"
User: "Any discounts?"
Total tokens: 150 (summary) + 50 (recent) = 200 tokens
vs Buffer: 1,200 tokens (all 12 messages)
Savings: 83% reduction in context tokens.
Pros:
Cons:
Use when: Conversations >10 turns, cost-sensitive.
Keep recent messages + important moments from earlier.
class WindowMemory:
def __init__(self, window_size=6, highlights_size=3):
self.messages = []
self.highlights = [] # Important messages
self.window_size = window_size
self.highlights_size = highlights_size
def add_message(self, role, content, is_important=False):
msg = {"role": role, "content": content}
self.messages.append(msg)
if is_important:
self.highlights.append(msg)
if len(self.highlights) > self.highlights_size:
self.highlights.pop(0)
def get_context(self):
recent = self.messages[-self.window_size:]
return self.highlights + recent # Highlights + recent window
How to determine "important":
def is_important(message):
# Rule-based
important_keywords = ["prefer", "always", "never", "email me", "don't call"]
if any(kw in message.lower() for kw in important_keywords):
return True
# Or use cheap LLM classifier
prompt = f"Is this message important to remember? (yes/no): {message}"
response = call_llm(prompt, model="gpt-3.5-turbo")
return "yes" in response.lower()
Use when: Need full detail + cost efficiency, can identify important moments.
Extract facts about user, store persistently.
import sqlite3
class EntityMemory:
def __init__(self, user_id):
self.user_id = user_id
self.db = sqlite3.connect('memory.db')
self._create_table()
def _create_table(self):
self.db.execute("""
CREATE TABLE IF NOT EXISTS user_facts (
user_id TEXT,
key TEXT,
value TEXT,
timestamp DATETIME DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (user_id, key)
)
""")
def store_fact(self, key, value):
self.db.execute("""
INSERT OR REPLACE INTO user_facts (user_id, key, value)
VALUES (?, ?, ?)
""", (self.user_id, key, value))
self.db.commit()
def get_fact(self, key):
cursor = self.db.execute("""
SELECT value FROM user_facts
WHERE user_id = ? AND key = ?
""", (self.user_id, key))
result = cursor.fetchone()
return result[0] if result else None
def get_all_facts(self):
cursor = self.db.execute("""
SELECT key, value FROM user_facts WHERE user_id = ?
""", (self.user_id,))
return dict(cursor.fetchall())
# Usage
memory = EntityMemory(user_id="user_123")
# Extract from conversation
message = "I prefer email communication, not phone calls."
# Use LLM to extract fact
fact_prompt = f"""
Extract key facts from this message in JSON format:
Message: {message}
Return: {{"key": "communication_preference", "value": "email"}}
"""
fact = extract_fact_with_llm(fact_prompt)
memory.store_fact(fact['key'], fact['value'])
# Later conversation
prefs = memory.get_all_facts()
# {'communication_preference': 'email'}
# Include in agent prompt
system_prompt = f"""
You are a helpful assistant.
User preferences: {prefs}
"""
What to store:
Extraction pipeline:
def extract_entities_from_conversation(conversation):
prompt = f"""
Extract important facts about the user from this conversation.
Return as JSON list: [{{"key": "...", "value": "..."}}, ...]
Conversation:
{conversation}
Facts:
"""
response = call_llm(prompt, model="gpt-4-turbo")
facts = json.loads(response)
return facts
Run after each conversation, store facts in database.
Without memory (typical query):
System prompt: 100 tokens
User query: 50 tokens
Total input: 150 tokens
Cost: 150 × $0.01/1K = $0.0015
With buffer memory (10-turn conversation):
System prompt: 100 tokens
Conversation history: 2,000 tokens (10 turns)
User query: 50 tokens
Total input: 2,150 tokens
Cost: 2,150 × $0.01/1K = $0.0215
14× more expensive.
With summary memory (same conversation):
System prompt: 100 tokens
Summary: 200 tokens
Recent messages (4): 400 tokens
User query: 50 tokens
Total input: 750 tokens
Cost: 750 × $0.01/1K = $0.0075
5× cheaper than buffer, 5× more expensive than no memory.
With entity memory only:
System prompt: 100 tokens
User facts: 50 tokens ("communication_preference: email")
User query: 50 tokens
Total input: 200 tokens
Cost: 200 × $0.01/1K = $0.002
33% more expensive than no memory, 10× cheaper than buffer.
| Strategy | Tokens per Query | Cost per Query | Use Case |
|---|---|---|---|
| No memory | 150 | $0.0015 | One-off queries, no context needed |
| Entity only | 200 | $0.0020 | Personalization without conversation history |
| Summary | 750 | $0.0075 | Long conversations, cost-sensitive |
| Buffer (10 turns) | 2,150 | $0.0215 | Short conversations, need exact history |
Recommendation: Start with summary memory + entity memory. Best cost/quality trade-off.
Before memory:
After adding memory:
Cost impact:
Quote from Maria Santos, Head of Support: "Adding memory to our support agent was game-changing. Users stopped having to repeat themselves. Satisfaction jumped 34%, first-contact resolution improved 28%."
Combine all three types:
class HybridMemory:
def __init__(self, user_id):
self.short_term = SummaryMemory() # Conversation context
self.long_term = EntityMemory(user_id) # User facts
self.semantic = RAGRetriever() # Knowledge base
def build_context(self, user_query):
# 1. Get conversation history
conversation_context = self.short_term.get_context()
# 2. Get user facts
user_facts = self.long_term.get_all_facts()
# 3. Retrieve relevant knowledge
knowledge = self.semantic.retrieve(user_query, top_k=3)
# 4. Combine into prompt
prompt = f"""
User facts: {user_facts}
Relevant knowledge:
{knowledge}
Conversation history:
{conversation_context}
User query: {user_query}
"""
return prompt
Result: Agent has short-term context + knows user + accesses knowledge base.
How long should I keep conversation history?
Short-term: Current session only (clear after session ends or 30min inactivity) Long-term: Forever (disk is cheap, user expects permanent memory)
Exception: Privacy-sensitive conversations (medical, legal). Auto-delete after N days per compliance.
What about GDPR/privacy regulations?
Store minimum necessary:
Implementation:
def delete_user_data(user_id):
# GDPR right to be forgotten
db.execute("DELETE FROM user_facts WHERE user_id = ?", (user_id,))
db.execute("DELETE FROM conversation_history WHERE user_id = ?", (user_id,))
How do I handle memory across multiple agents?
Shared memory store: All agents access same database.
# Agent A stores fact
memory_a = EntityMemory(user_id="user_123")
memory_a.store_fact("timezone", "UTC-8")
# Agent B retrieves fact
memory_b = EntityMemory(user_id="user_123")
timezone = memory_b.get_fact("timezone") # "UTC-8"
Consistency: Both agents see same user facts.
Bottom line: Memory transforms stateless agents into personalized assistants. Use summary memory for conversations, entity memory for user facts. Costs 5-6× more but improves satisfaction 30-40% for customer-facing use cases.
Next: Read our Multi-Agent Systems guide for memory sharing across agents.