Agent memory vs Context window
Overview
Context window and agent memory are distinct mechanisms that are frequently conflated. The context window is the bounded amount of text (measured in tokens) a model can attend to in a single inference call; it is transient and discarded after the call. Agent memory is an external, persistent store that retains information across calls and sessions and is selectively loaded back into the context window when relevant.[1]
The distinction matters for system design: enlarging the context window does not by itself provide persistent memory, and persistent memory does not remove the per-call context limit.
How it works
- Context window: a fixed token budget per request. Everything the model "knows" at inference time must fit within it; content outside the window has no effect on that call.
- Agent memory: typically implemented as an external store (a database, vector index, or file) holding facts, prior interactions, or summaries. A retrieval or selection step (often RAG) injects relevant memory into the context window for each call.
Memory thus complements the window: it decides what to place into a finite window, persisting knowledge that the window alone cannot retain.
| Property | Context window | Agent memory |
|---|---|---|
| Persistence | Transient (one call) | Persistent (across calls/sessions) |
| Location | Inside the inference call | External store |
| Bounded by | Token limit | Storage capacity |
| Failure mode | Truncation / overflow | Stale or irretrievable memory |
A larger context window is not the same as memory: a model with a huge window still starts each new conversation without recall unless an external memory supplies the prior state.
Examples
- Pasting a long document into one prompt uses the context window; the model forgets it in the next session.
- An assistant that recalls your stated preferences weeks later is using agent memory that re-injects those facts into the window.
See also
References
- ↑ Augment Code. "Agent Memory vs. Context Engineering." https://www.augmentcode.com/guides/agent-memory-vs-context-engineering