Dynamical Systems Theory Behind a Hierarchical Reasoning Model

Vasiliy A. Es'kin; Mikhail E. Smorkalov

Dynamical Systems Theory Behind a Hierarchical Reasoning Model

Vasiliy A. Es'kin, Mikhail E. Smorkalov

Abstract

Current large language models (LLMs) primarily rely on linear sequence generation and massive parameter counts, yet they severely struggle with complex algorithmic reasoning. While recent reasoning architectures, such as the Hierarchical Reasoning Model (HRM) and Tiny Recursive Model (TRM), demonstrate that compact recursive networks can tackle these tasks, their training dynamics often lack rigorous mathematical guarantees, leading to instability and representational collapse. We propose the Contraction Mapping Model (CMM), a novel architecture that reformulates discrete recursive reasoning into continuous Neural Ordinary and Stochastic Differential Equations (NODEs/NSDEs). By explicitly enforcing the convergence of the latent phase point to a stable equilibrium state and mitigating feature collapse with a hyperspherical repulsion loss, the CMM provides a mathematically grounded and highly stable reasoning engine. On the Sudoku-Extreme benchmark, a 5M-parameter CMM achieves a state-of-the-art accuracy of 93.7 %, outperforming the 27M-parameter HRM (55.0 %) and 5M-parameter TRM (87.4 %). Remarkably, even when aggressively compressed to an ultra-tiny footprint of just 0.26M parameters, the CMM retains robust predictive power, achieving 85.4 % on Sudoku-Extreme and 82.2 % on the Maze benchmark. These results establish a new frontier for extreme parameter efficiency, proving that mathematically rigorous latent dynamics can effectively replace brute-force scaling in artificial reasoning.

Dynamical Systems Theory Behind a Hierarchical Reasoning Model

Abstract

Paper Structure (25 sections, 44 equations, 8 figures, 11 tables)

This paper contains 25 sections, 44 equations, 8 figures, 11 tables.

Introduction
Current State and Statement of the Problem
Hierarchical Reasoning Model
Tiny Recursion Model
Training
Deep supervision for the HRM
Adaptive computational time (ACT)
Training of TRM
Modifications of the HRM
From Discrete Equations to Neural Ordinary Differential Equations
Modifications of the training and model
Equilibrium points
Repulsion Loss Term
Modifications of StableMax
Neural Stochastic Differential Equations
...and 10 more sections

Figures (8)

Figure 1: Stylized visualizations of neural network models represented as volumetric Aizawa attractor trajectories. The number of model parameters is proportional to the volume of the sphere circumscribed around the corresponding attractor.
Figure 2: Architecture of hierarchical reasoning model.
Figure 3: Diagram of training of HRM (a) and TRM (b).
Figure 4: Pseudocode of training of the HRM (left) and TRM (right).
Figure 5: Evolution of the phase portrait of the dynamical system (\ref{['eq19']}), (\ref{['eq20']}) during neural network training.
...and 3 more figures

Dynamical Systems Theory Behind a Hierarchical Reasoning Model

Abstract

Dynamical Systems Theory Behind a Hierarchical Reasoning Model

Authors

Abstract

Table of Contents

Figures (8)