Table of Contents
Fetching ...

Contextual Memory Virtualisation: DAG-Based State Management and Structurally Lossless Trimming for LLM Agents

Cosmo Santoni

TL;DR

This work introduces a three-pass structurally lossless trimming algorithm that preserves every user message and assistant response verbatim while reducing token counts by a mean of 20% and up to 86% for sessions with significant overhead by stripping mechanical bloat such as raw tool outputs, base64 images, and metadata.

Abstract

As large language models engage in extended reasoning tasks, they accumulate significant state -- architectural mappings, trade-off decisions, codebase conventions -- within the context window. This understanding is lost when sessions reach context limits and undergo lossy compaction. We propose Contextual Memory Virtualisation (CMV), a system that treats accumulated LLM understanding as version-controlled state. Borrowing from operating system virtual memory, CMV models session history as a Directed Acyclic Graph (DAG) with formally defined snapshot, branch, and trim primitives that enable context reuse across independent parallel sessions. We introduce a three-pass structurally lossless trimming algorithm that preserves every user message and assistant response verbatim while reducing token counts by a mean of 20% and up to 86% for sessions with significant overhead by stripping mechanical bloat such as raw tool outputs, base64 images, and metadata. A single-user case-study evaluation across 76 real-world coding sessions demonstrates that trimming remains economically viable under prompt caching, with the strongest gains in mixed tool-use sessions, which average 39% reduction and reach break-even within 10 turns. A reference implementation is available at https://github.com/CosmoNaught/claude-code-cmv.

Contextual Memory Virtualisation: DAG-Based State Management and Structurally Lossless Trimming for LLM Agents

TL;DR

This work introduces a three-pass structurally lossless trimming algorithm that preserves every user message and assistant response verbatim while reducing token counts by a mean of 20% and up to 86% for sessions with significant overhead by stripping mechanical bloat such as raw tool outputs, base64 images, and metadata.

Abstract

As large language models engage in extended reasoning tasks, they accumulate significant state -- architectural mappings, trade-off decisions, codebase conventions -- within the context window. This understanding is lost when sessions reach context limits and undergo lossy compaction. We propose Contextual Memory Virtualisation (CMV), a system that treats accumulated LLM understanding as version-controlled state. Borrowing from operating system virtual memory, CMV models session history as a Directed Acyclic Graph (DAG) with formally defined snapshot, branch, and trim primitives that enable context reuse across independent parallel sessions. We introduce a three-pass structurally lossless trimming algorithm that preserves every user message and assistant response verbatim while reducing token counts by a mean of 20% and up to 86% for sessions with significant overhead by stripping mechanical bloat such as raw tool outputs, base64 images, and metadata. A single-user case-study evaluation across 76 real-world coding sessions demonstrates that trimming remains economically viable under prompt caching, with the strongest gains in mixed tool-use sessions, which average 39% reduction and reach break-even within 10 turns. A reference implementation is available at https://github.com/CosmoNaught/claude-code-cmv.
Paper Structure (16 sections, 4 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 16 sections, 4 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: Context window before (left, 132k message tokens, 76% capacity) and after (right, 2.3k message tokens, 12% capacity) native autocompaction (e.g., Claude Code). Autocompaction summarises 98% of accumulated session state into a brief summary to reclaim window space.
  • Figure 2: Distribution of token reduction across 76 sessions, segmented by bloat profile. The median reduction is 12%; the mean is pulled higher (20%) by a tail of sessions with significant trimmable overhead.
  • Figure 3: Break-even turns vs. token reduction. Sessions with $>$30% reduction reach break-even within 15 turns. Sessions with minimal overhead cluster at the 60-turn cap, correctly indicating trimming is unnecessary.
  • Figure 4: Cumulative input cost with and without trimming. The highlighted session (46% reduction) reaches break-even at turn 6. Faint lines show other sessions; sessions with greater reduction diverge earlier, while sessions with minimal reduction show negligible separation.
  • Figure 5: Context composition by session. Green represents conversation content (preserved by trimming); red, orange, purple, and blue represent trimmable overhead. Sessions with more overhead see larger reductions.