Table of Contents
Fetching ...

Recurrence-Complete Frame-based Action Models

Michael Keiblinger

TL;DR

This work contends that true sequential computation is essential for long-horizon perception and agentic control, introducing the notions of true depth and recurrence-completeness to formalize this claim. It proves that architectures with fully parallelizable forward/backward passes or input aggregation cannot be recurrence-complete, and it identifies input-length proportionality and input aggregation criticality as operational limits. To test these ideas, it develops Forward-Referencing Jumps Task and Withheld Maze Position Tracking benchmarks that induce serial constraints, showing depth-related degradation for time-parallel models while a lightweight LSTM remains robust. The authors propose Recurrence-Complete Frame-based Action Models, combining a frame-head with an LSTM backbone, and demonstrate a robust power-law relationship between training sequence length and loss under fixed parameters, with wall-time amortization enabling longer sequences to outperform shorter ones over time. Overall, the theory and experiments motivate investing in recurrence-enabled architectures for long-horizon, side-effect-rich tasks and point toward practical, frame-based designs suited to streaming, serial computation in real-world settings.

Abstract

In recent years, attention-like mechanisms have been used to great success in the space of large language models, unlocking scaling potential to a previously unthinkable extent. "Attention Is All You Need" famously claims RNN cells are not needed in conjunction with attention. We challenge this view. In this paper, we point to existing proofs that architectures with fully parallelizable forward or backward passes cannot represent classes of problems specifically interesting for long-running agentic tasks. We further conjecture a critical time t beyond which non-recurrence-complete models fail to aggregate inputs correctly, with concrete implications for agentic systems (e.g., software engineering agents). To address this, we introduce a recurrence-complete architecture and train it on GitHub-derived action sequences. Loss follows a power law in the trained sequence length while the parameter count remains fixed. Moreover, longer-sequence training always amortizes its linearly increasing wall-time cost, yielding lower loss as a function of wall time.

Recurrence-Complete Frame-based Action Models

TL;DR

This work contends that true sequential computation is essential for long-horizon perception and agentic control, introducing the notions of true depth and recurrence-completeness to formalize this claim. It proves that architectures with fully parallelizable forward/backward passes or input aggregation cannot be recurrence-complete, and it identifies input-length proportionality and input aggregation criticality as operational limits. To test these ideas, it develops Forward-Referencing Jumps Task and Withheld Maze Position Tracking benchmarks that induce serial constraints, showing depth-related degradation for time-parallel models while a lightweight LSTM remains robust. The authors propose Recurrence-Complete Frame-based Action Models, combining a frame-head with an LSTM backbone, and demonstrate a robust power-law relationship between training sequence length and loss under fixed parameters, with wall-time amortization enabling longer sequences to outperform shorter ones over time. Overall, the theory and experiments motivate investing in recurrence-enabled architectures for long-horizon, side-effect-rich tasks and point toward practical, frame-based designs suited to streaming, serial computation in real-world settings.

Abstract

In recent years, attention-like mechanisms have been used to great success in the space of large language models, unlocking scaling potential to a previously unthinkable extent. "Attention Is All You Need" famously claims RNN cells are not needed in conjunction with attention. We challenge this view. In this paper, we point to existing proofs that architectures with fully parallelizable forward or backward passes cannot represent classes of problems specifically interesting for long-running agentic tasks. We further conjecture a critical time t beyond which non-recurrence-complete models fail to aggregate inputs correctly, with concrete implications for agentic systems (e.g., software engineering agents). To address this, we introduce a recurrence-complete architecture and train it on GitHub-derived action sequences. Loss follows a power law in the trained sequence length while the parameter count remains fixed. Moreover, longer-sequence training always amortizes its linearly increasing wall-time cost, yielding lower loss as a function of wall time.

Paper Structure

This paper contains 53 sections, 3 theorems, 16 equations, 31 figures, 4 tables.

Key Result

Theorem 1

Assume the architecture is recurrence-complete: for any function $g:\mathcal{H}^k\times\mathcal{X}\to\mathcal{H}$ there exists a program in the architecture computing $h_t = g(h_{t-1},\ldots,h_{t-k},x_t)$. Regard $g$ as an opaque primitive (no algebraic identities are assumed beyond extensional equa

Figures (31)

  • Figure 1: A sequence of instructions with strict data-dependence.
  • Figure 2: Transformer validation accuracy as a function of layer count for different maximum depths
  • Figure 3: Mamba validation accuracy as a function of layer count for different maximum depths
  • Figure 4: LSTM validation accuracy as a function of maximum depth
  • Figure 5: Visualization of the Maze Used in all Experiments
  • ...and 26 more figures

Theorems & Definitions (8)

  • Theorem 1
  • proof
  • Remark 1
  • Lemma 1
  • proof
  • Theorem 2
  • proof
  • Remark 2