Skip to main content

Memory

The whole lesson in one sentence: an agent only remembers what it writes to a file, so memory is a set of typed stores plus a few tools the loop calls to read and write them.

Memory is not one thing

When someone says "the agent needs memory," they usually mean one of four different things and don't know it. One person is picturing vector search over old chats. Another is picturing reusable Python functions. A third means the current scratchpad. If you don't name the type, the conversation stays mush.

The canonical map comes from psychology. Endel Tulving (1972) split human memory into episodic (autobiographical, time and place bound: "what we ate last Sunday") and semantic (general knowledge, detached from time and place: "Brussels is the capital of Belgium"). The access verbs differ: episodic is "I remember that," semantic is "I know that." Squire (1992) added the bracket above them: episodic and semantic together are declarative memory, the stuff you can state. Skills and habits sit in a separate non-declarative branch.

Tulving's deeper point, confirmed later by brain-lesion patients like H.M. (who lost episodic memory but kept semantic) and by fMRI showing different activation for "remember" versus "know," is that memory is not one module. It is a family of systems with different substrates and different access patterns. That single idea is the foundation almost every agent-memory paper builds on.

CoALA: the same map, drawn for agents

Sumers, Yao, Narasimhan and Griffiths turned Tulving and Squire into an agent vocabulary in Cognitive Architectures for Language Agents (CoALA, 2023, TMLR 2024). Four types, each with a home in the architecture.

TypeWhere it livesTypical implementationGood for
WorkingThe context windowScratchpad tokens, current trajectory, tool outputsCurrent reasoning, direct observations
EpisodicA chronological event logVector DB of past runs, Reflexion notes, memory streams"What did we do last session," few-shot from earlier runs
SemanticA knowledge baseRAG over pgvector, a wiki, a knowledge graphFacts, domain knowledge, fixed preferences
ProceduralThe system prompt or a skill libraryCLAUDE.md, a skill library, fine-tuned weightsReusable skills, style guides, standard procedures

Working memory is the context window itself: a finite attention budget. The other three live outside the window and get pulled in by retrieval or just-in-time loading when needed. CoALA does not prescribe one architecture. It gives you the vocabulary to ask the only question that matters: which type of memory are we missing, and what is the right implementation of it?

The failure modes line up one-to-one with the gaps:

Markdown is the durable store

Here is where the field converged in 2025 and 2026. The model remembers only what gets written to disk, and that disk memory is plain, human-readable Markdown, not a vector database. Vector stores did not vanish. They got demoted from primary storage to a derived index.

The rule is: no hidden state. Everything the agent "remembers" sits in plain text files that you can edit, version in git, read as a human or an LLM, and interpret without any framework. If a hallucination lands in memory, you open the file and delete the line. There is no black box to work around.

Why this pattern wins, drawn straight from the people running it in production:

  • Editability is debuggability. A vector store that hallucinates is opaque. A Markdown file with a wrong fact is three lines to edit. The next time the agent reloads context, the bad fact is gone.
  • Version control comes free. git diff shows exactly what memory changed, git blame shows why, git revert rolls it back. No vector DB gives you that.
  • Latency is free. Reading a 200-line Markdown file into context is, in Peter Steinberger's words, "essentially free," and with strictly better recall than vector similarity over the same chunks, because you get the whole file.
  • The format matches the training data. LLMs are trained on enormous amounts of Markdown: GitHub READMEs, docs, technical blogs. It is their native language. A file with clear headings is understood better than a JSON blob or a dense embedding.

Thorsten Ball of Sourcegraph puts the consequence plainly: "we are optimizing code for readability over writability, except now the reader is also an agent." The Markdown maps cleanly onto CoALA: memory/YYYY-MM-DD.md files are episodic, a MEMORY.md of durable facts is semantic, and CLAUDE.md or a skill file is procedural. Procedural is where Markdown is strongest, because it is instruction-level information that has to be readable by both humans and models.

One caveat the research flags. Gloaguen et al. (2026, reported secondhand) found that LLM-generated memory files lowered task success in five of eight settings and added 20 to 23 percent inference cost, while hand-written minimal files improved performance by four percent. The lesson is not "skip memory files." It is: write them by hand, keep them short.

Memory operations are tools inside the loop

If you remember lesson 1, an agent is a model calling tools in a loop. Memory fits that exactly. Reading and writing memory are not a separate subsystem bolted on the side. They are tool calls the model makes mid-loop, the same shape as any other action.

This is the cleanest worked example. MemGPT (the paper) and Letta (the system it became) treat the context window like RAM and external storage like disk, an OS-inspired design. The model edits its own memory through tool calls: memory_insert to write a fact, archival_memory_search to retrieve from long-term storage. When the context window fills, the agent pages relevant content in and out, the way an operating system swaps pages. Letta splits storage into core memory (always in context), recall (recent conversation), and archival (the long tail). Anthropic's own Messages API memory tool follows the same primitive: the model emits tool_use commands (view, create, str_replace, insert, delete) against a virtual /memories folder, and your application decides where that folder actually lives.

The thing to notice: the model decides when to remember and when to recall, by calling a tool, inside the same loop it uses to call every other tool. There is no separate planner managing memory. CoALA names this the structured action space: internal actions on memory sit next to external actions on the world.

Two consolidation techniques are worth knowing, because raw episodic memory grows without bound. A silent turn before compaction runs an internal step that reminds the agent to flush important context to memory files before the window gets compacted, so nothing is lost in the squeeze. Dreaming is a background job (a cron task) that scores candidate memories by importance and promotes only the ones above a threshold into the durable store. That is the same sleep-time consolidation Letta uses. Batch and curated, versus inline auto-memory that writes every turn, unfiltered.

A warning about runtime memory

There is a trap. Many tools now offer "memory" as a hosted feature: Cursor Memories, Copilot Memory, Devin Knowledge. These are runtime memory, vendor-side, with no export format and no standard. Copilot Memory even deletes itself after 28 days. None of it survives a move to another tool. Content memory, the Markdown in your git, survives everything: another machine, another model, another harness. Treat vendor memory as a throwaway cache for this week. Treat the files in git as the asset. The split is the same content-versus-runtime line that runs through every part of the agent stack.

Takeaways

  • Memory is four types, not one: working (the context window), episodic (event log), semantic (facts), procedural (skills and prompts). Name the type before you build it.
  • Markdown on disk is the durable store. Editable, versionable, free to load, native to the model. Vectors are a derived index, not the source of truth.
  • Memory operations are tool calls inside the loop. Letta and the Anthropic memory tool both let the model read and write its own memory through tool_use. No separate planner.
  • Write memory files by hand and keep them short. Auto-generated, bloated files measurably hurt.
  • Vendor runtime memory is a cache, not an asset. Only what is in git survives a move.

Where this goes next

Memory reaches the model through tool calls, which raises the obvious next question: how do you wire tools to an agent in the first place, and how do they travel between harnesses? The next lesson, Tools, MCP and CLI-over-MCP, takes the tool primitive seriously: what MCP standardizes, where it stops (it has no memory primitive, as you just saw), and why a plain command-line tool is often the better interface than a protocol.