Neural Paging: Learning Context Management Policies for Turing-Complete Agents

Liang Chen; Qi Liu

Neural Paging: Learning Context Management Policies for Turing-Complete Agents

Liang Chen, Qi Liu

TL;DR

This work introduces Neural Paging, a hierarchical architecture that decouples symbolic reasoning from information resource management and derives a robustness bound (Theorem~4) that quantifies competitive-ratio degradation under policy-dependent access with bounded sensitivity.

Abstract

The proof that Large Language Models (LLMs) augmented with external read-write memory constitute a computationally universal system has established the theoretical foundation for general-purpose agents. However, existing implementations face a critical bottleneck: the finite and costly Context Window, which functions not as infinite memory but as a scarce semantic cache. In this work, we introduce \textit{Neural Paging}, a hierarchical architecture that decouples symbolic reasoning from information resource management. We formulate the \textit{Context Paging Problem (CPP)} and propose a lightweight, differentiable \textit{Page Controller} designed to approximate ``Semantic Belady's Optimality'' -- retaining tokens with high future utility under explicit assumptions on access patterns. We provide theoretical analysis showing that, under bounded context window size~$K$, Neural Paging reduces the asymptotic complexity of long-horizon reasoning from quadratic $O(N^2)$ to $O(N \cdot K^2)$, and we derive a robustness bound (Theorem~4) that quantifies competitive-ratio degradation under policy-dependent access with bounded sensitivity. We validate these bounds on synthetic paging traces, confirming that the theoretical guarantees hold and identifying significant slack that motivates learned policies.

Neural Paging: Learning Context Management Policies for Turing-Complete Agents

TL;DR

Abstract

, Neural Paging reduces the asymptotic complexity of long-horizon reasoning from quadratic

, and we derive a robustness bound (Theorem~4) that quantifies competitive-ratio degradation under policy-dependent access with bounded sensitivity. We validate these bounds on synthetic paging traces, confirming that the theoretical guarantees hold and identifying significant slack that motivates learned policies.

Paper Structure (52 sections, 12 theorems, 16 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 52 sections, 12 theorems, 16 equations, 5 figures, 2 tables, 1 algorithm.

Introduction
Related Work
Memory-Augmented Language Models
Context Extension Techniques
Retrieval-Augmented Generation (RAG)
LLM Agents and Operating System Analogies
Learnable Cache Management
Paging Theory and Competitive Analysis
Theoretical Framework
Preliminaries and Notation
The Agent as a Computational System
Access Model and Assumptions
Assumption Robustness and Relaxations
Estimating $\beta$ in Practice
The Context Paging Problem
...and 37 more sections

Key Result

Theorem 1

Let $\mathcal{M}$ be a MALA with external memory size $M = \omega(1)$ that grows with the input length and retrieval satisfying Assumption 3. Then $\mathcal{M}$ can simulate any Turing machine $\mathrm{TM}$. If the TM runs in $T_{\mathrm{TM}}(n)$ steps using $S(n)$ tape cells, the simulation require

Figures (5)

Figure 1: H-NTM System Architecture (schematic). The Main Agent (LLM) focuses on token generation. The Page Controller monitors activations and manages data flow between Context Window (Cache) and External Memory (Disk).
Figure 2: Context as Cache Hierarchy (schematic). The Context Window acts as L1/L2 Cache, requiring distinct management strategies from the massive External Knowledge Base.
Figure 3: Neural Paging Workflow (schematic). As reasoning progresses from $t$ to $t{+}2$, old blocks are evicted and new blocks are prefetched, maintaining a dynamic window of semantic relevance.
Figure 4: (a) Fault rate vs. cache size on Zipf traces ($M{=}64$, $T{=}5{,}000$). Belady is optimal; LRU is the best online heuristic. LFU suffers from frequency poisoning on shifting working sets. (b) Empirical competitive ratio vs. worst-case bound $K_b$ (Theorem 3). On structured traces, online algorithms perform far better than worst-case, with LRU at ${\approx}1.9\times$ optimal. Error bars: $\pm 1$ s.d. over 10 seeds.
Figure 5: Theorem 4 validation ($K_b{=}8$, $T{=}5{,}000$). (a) Fault stability: empirical $|F_A(r^\beta) - F_A(r^0)|$ vs. reference line $\beta T$. The empirical values slightly exceed $\beta T$ at small $\beta$ (cascade effect), but remain well within the corrected $(K_b{+}1)\beta T$ bound. (b) Theorem 4 bound (red dashed) vs. actual LRU faults (orange). The bound holds with large slack, indicating room for tighter instance-dependent analysis. Error bars: $\pm 1$ s.d. over 10 seeds.

Theorems & Definitions (53)

Definition 1: Memory-Augmented Language Agent
Definition 2: Agent Configuration
Remark : Markov State vs. Observation
Theorem 1: Turing Completeness of Memory-Augmented LLMs
proof : Proof Sketch
Remark
Definition 3: Requested Block
Remark : Operational Approximation
Definition 3b: Approximate Requested Block
Lemma 1b: Approximate Request Error Bound
...and 43 more

Neural Paging: Learning Context Management Policies for Turing-Complete Agents

TL;DR

Abstract

Neural Paging: Learning Context Management Policies for Turing-Complete Agents

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (53)