Long-term memory (AI)

From llmref.wiki
Long-term memory (AI) — Persistent storage system enabling an agent to retain and retrieve information across multiple sessions and interactions.

Overview

Long-term memory in artificial intelligence refers to any storage mechanism that allows an agent or system to persist information beyond a single conversation or session. Unlike the context window, which is constrained by computational limits and cleared after each interaction, long-term memory is designed for durability and cross-session continuity. This distinction is central to agent memory architecture.

Long-term memory systems store structured and unstructured data—including interaction history, learned preferences, domain-specific facts, and embeddings—in external databases or vector stores. This enables agents to build institutional knowledge over time, reduce redundant processing, and personalize responses based on historical patterns. Long-term memory is essential for stateful agentic workflows where tasks span multiple user interactions or where an agent must accumulate evidence across sessions.

The technical implementation typically involves a combination of retrieval mechanisms (retrieval-augmented generation, semantic search, or hybrid search) and embedding models that convert stored information into a queryable format. Access to long-term memory introduces latency and requires strategies for relevance ranking (reranking), prioritization, and conflict resolution when stored information contradicts new observations.

How it works

A long-term memory system operates through three primary stages: encoding, storage, and retrieval.

During encoding, information from agent interactions is converted into structured or semi-structured records. Chunking strategies break down lengthy interactions into semantically coherent units. These chunks are then embedded using an embedding model, producing dense vector representations that capture semantic meaning independent of surface-level text variation.

Storage occurs in a persistent backend, typically a vector database, relational database, or knowledge graph. Vector databases (e.g., Pinecone, Weaviate) specialize in similarity search over embeddings; relational systems preserve structured metadata; knowledge graphs capture entity relationships and facts. The choice of storage system depends on query patterns, scale, and integration with the agent's Model Context Protocol or similar memory interfaces.

Retrieval is triggered when an agent requires historical information to inform its reasoning or planning. A query is encoded using the same embedding model, and dense retrieval, BM25, or hybrid search methods return ranked candidates. Retrieved items are reranked if necessary, then injected into the context window or fed into downstream decision-making logic. Self-RAG techniques allow agents to assess whether retrieved memories are relevant before incorporation.

Distinction from related terms

Term Distinction
Context window Context window is bounded, ephemeral memory available within a single model invocation. Long-term memory persists across sessions and is accessed via retrieval, not held continuously in the model's working memory.
Knowledge cutoff Knowledge cutoff refers to the temporal boundary of training data in a model's weights. Long-term memory is dynamically updated post-training and can store information newer than the model's cutoff.
Agent memory Agent memory is a broader category encompassing both ephemeral working memory and long-term persistent storage. Long-term memory is a specific subset designed for cross-session retention.
Retrieval-augmented generation (RAG) RAG is a technique for augmenting generation with retrieved information. Long-term memory is the storage substrate that RAG retrieves from; RAG does not require long-term memory, but long-term memory is often implemented via RAG-like retrieval.
Engram-based parametric personalization Parametric personalization embeds user-specific information directly into model weights via fine-tuning. Long-term memory stores information separately from model parameters, allowing real-time updates without retraining.
Fine-tuning Fine-tuning updates model parameters to reflect new information. Long-term memory stores information externally, enabling rapid updates and selective retrieval without parameter optimization overhead.

Examples

In multi-turn conversational AI systems deployed by enterprise support teams, long-term memory stores summaries of previous customer interactions, issue resolutions, and learned preferences. When a customer re-engages weeks later, the system retrieves their interaction history, avoiding repeated information gathering and enabling personalized, context-aware responses. This is typical in commercial agentic chatbot platforms.

Research on multi-agent orchestration systems (e.g., AutoGPT, BabyAGI) demonstrates long-term memory as shared task logs and intermediate results stored in persistent files or databases. Agents query these stores during planning to avoid redundant computation and coordinate across parallel workflows. The system maintains a knowledge graph of discovered facts, enabling agents to reason over accumulated evidence.

In synthetic data generation and automated evaluation pipelines, long-term memory stores benchmark datasets, past evaluation results, and model card metadata. New evaluation runs retrieve these historical references to compute comparative metrics, detect benchmark contamination, and surface regressions without re-computing baseline results from scratch.

See also

References