Agent memory vs Context window

From llmref.wiki
Agent memory vs Context window — A context window is the temporary input a model sees per call; agent memory is persistent state retained across calls and sessions.

Overview

Context window and agent memory are distinct mechanisms that are frequently conflated. The context window is the bounded amount of text (measured in tokens) a model can attend to in a single inference call; it is transient and discarded after the call. Agent memory is an external, persistent store that retains information across calls and sessions and is selectively loaded back into the context window when relevant.[1]

The distinction matters for system design: enlarging the context window does not by itself provide persistent memory, and persistent memory does not remove the per-call context limit.

How it works

  • Context window: a fixed token budget per request. Everything the model "knows" at inference time must fit within it; content outside the window has no effect on that call.
  • Agent memory: typically implemented as an external store (a database, vector index, or file) holding facts, prior interactions, or summaries. A retrieval or selection step (often RAG) injects relevant memory into the context window for each call.

Memory thus complements the window: it decides what to place into a finite window, persisting knowledge that the window alone cannot retain.

Distinction from related terms

Property Context window Agent memory
Persistence Transient (one call) Persistent (across calls/sessions)
Location Inside the inference call External store
Bounded by Token limit Storage capacity
Failure mode Truncation / overflow Stale or irretrievable memory

A larger context window is not the same as memory: a model with a huge window still starts each new conversation without recall unless an external memory supplies the prior state.

Examples

  • Pasting a long document into one prompt uses the context window; the model forgets it in the next session.
  • An assistant that recalls your stated preferences weeks later is using agent memory that re-injects those facts into the window.

See also

References

  1. Augment Code. "Agent Memory vs. Context Engineering." https://www.augmentcode.com/guides/agent-memory-vs-context-engineering