Memory types (AI)
Overview
Memory types in AI agents refer to distinct categories of information storage and recall mechanisms that enable agents to learn from interactions, retain knowledge, and improve performance over time. These systems parallel human cognitive memory structures but operate through computational mechanisms such as vector embeddings, parametric weights, and external databases. The taxonomy typically includes semantic memory (factual knowledge and concepts), episodic memory (specific experiences and events), procedural memory (learned skills and procedures), and working memory (active information processing within a task).
Agent memory systems operate at multiple timescales and substrates. Short-term mechanisms include the context window of a language model, which retains information for the duration of a single inference pass. Medium-term mechanisms involve persistent storage separate from model parameters, such as vector databases or knowledge graphs. Long-term mechanisms integrate learned patterns into model weights through fine-tuning or persistent embeddings that support contextual retrieval.
The design of agent memory architectures directly impacts system capabilities in multi-agent systems, agentic workflows, and planning tasks. Agents with well-structured memory systems can maintain consistency across interactions, leverage retrieval-augmented generation to ground outputs in external facts, and implement self-reflection loops that reduce hallucination and improve factual consistency.
How it works
Memory types in AI agents function through distinct mechanisms mapped to their cognitive analogues:
Semantic memory stores generalized knowledge representations, typically as embedding vectors in dense retrieval systems or structured knowledge graphs. When an agent requires factual information, it performs semantic search or hybrid search over these representations. The agent encodes queries and documents into a shared vector space, computing similarity to retrieve relevant knowledge without storing explicit episodic links to where that knowledge originated.
Episodic memory records specific interactions, events, and their contextual details with metadata (timestamp, agent state, outcome). In practice, episodic memory is implemented as indexed transcript databases, conversation logs, or long-term memory stores that support retrieval by temporal proximity or relevance. Engram-based parametric personalization associates specific users or contexts with episodic traces that influence future outputs.
Procedural memory encodes learned behavioral patterns and task execution strategies. In neural agents, procedural knowledge is distributed across model weights, refined through RLHF, DPO, or instruction tuning. Procedurally, agents also manifest this memory through learned prompt engineering patterns and stereotyped response templates that emerge from training.
Working memory in AI agents corresponds to the context window and the intermediate states maintained during chain-of-thought reasoning. It holds the current problem formulation, partial solutions, and active hypotheses available for the current inference step. Unlike human working memory, agent working memory is typically stateless across API calls unless explicitly persisted via Model Context Protocol or session-level state management.
Integration across memory types occurs through retrieval mechanisms: agents performing planning may retrieve relevant episodic examples, ground outputs in semantic facts via RAG, apply procedural heuristics, and maintain working context in-flight. LLM-as-judge systems and critic agents evaluate whether memory retrieval is appropriate and whether outputs remain grounded in retrieved information.
| Term | Distinction |
|---|---|
| Context window | Context window is the maximum token span available for a single inference pass; it is a working memory substrate but does not persist between requests. Memory types encompass persistent storage mechanisms that survive across sessions. |
| Long-term memory | Long-term memory refers specifically to persistent storage external to model parameters (e.g., vector databases, transcript archives). Memory types is the broader taxonomy covering semantic, episodic, procedural, and working memory, of which long-term memory is one temporal tier. |
| Knowledge graph | A knowledge graph is a structured data representation typically used to implement semantic memory. Memory types is the functional taxonomy; a knowledge graph is a concrete instantiation mechanism for one memory class. |
| Agent memory vs Context window | This page directly addresses the distinction between persistent agent memory (across sessions) and the context window (within a single inference). Agent memory is the superset; context window is the working memory substrate for a single turn. |
| Embeddings | Embeddings are a vector representation technique used to implement semantic and episodic memory retrieval. Memory types describes the cognitive functions; embeddings are the underlying representational mechanism. |
| Parametric vs Non-parametric memory | Parametric memory stores learned patterns in model weights (e.g., procedural memory). Non-parametric memory stores explicit records (e.g., episodic in transcript databases). Memory types encompasses both; the parametric/non-parametric distinction describes the storage substrate. |
Examples
OpenAI Assistants API with vector-backed episodic memory: The Assistants API implements semantic memory via Retrieval-Augmented Generation using uploaded files indexed in a vector store. It maintains episodic memory through conversation history stored in thread objects. When a user asks a follow-up question, the system retrieves both recent context (working memory from the thread) and semantically relevant documents (semantic memory via vector search), allowing the assistant to reference earlier points in the conversation and external knowledge simultaneously.
Multi-turn dialogue with chain-of-thought scratchpad: An agent tasked with solving a math word problem maintains working memory in an explicit scratchpad that lists intermediate steps, subgoal achievements, and current hypotheses. As the agent progresses, earlier steps remain in the context window (working memory). If the agent learns a generally applicable strategy (e.g., "always extract numerical quantities first"), that becomes procedural memory refined through RLHF and integrated into the model. Episodic memory of this specific problem solving session is archived in logs, enabling future self-reflection or critic agent review.
Multi-agent system with Model Context Protocol memory bridge: In a customer service system, multiple specialized agents (routing agent, knowledge agent, escalation agent) communicate via MCP. The routing agent maintains working memory of the current ticket state. The knowledge agent accesses semantic memory (product documentation indexed in a knowledge graph) and episodic memory (past tickets with similar issues stored in a retrieval database). Procedural memory governing agent hand-off logic is encoded in learned orchestration patterns. Between ticket sessions, episodic memory persists, allowing the system to recognize returning issues and refine its procedural knowledge.
See also
- Agent memory vs Context window — formal distinction between persistent and session-bounded memory substrates
- Long-term memory (AI) — persistent external storage mechanisms for agent memory
- Retrieval-augmented generation — technique for grounding agent outputs in semantic and episodic memory
- Self-reflection (AI) — agent capability to revisit episodic and procedural memory to improve future decisions
- Multi-agent orchestration — systems-level patterns for coordinating memory access across multiple agents
- Knowledge graph — structured semantic memory representation