Table of Contents
Fetching ...

Mechanistic Foundations of Goal-Directed Control

Alma Lago

Abstract

Mechanistic interpretability has transformed the analysis of transformer circuits by decomposing model behavior into competing algorithms, identifying phase transitions during training, and deriving closed-form predictions for when and why strategies shift. However, this program has remained largely confined to sequence-prediction architectures, leaving embodied control systems without comparable mechanistic accounts. Here we extend this framework to sensorimotor-cognitive development, using infant motor learning as a model system. We show that foundational inductive biases give rise to causal control circuits, with learned gating mechanisms converging toward theoretically motivated uncertainty thresholds. The resulting dynamics reveal a clean phase transition in the arbitration gate whose commitment behavior is well described by a closed-form exponential moving-average surrogate. We identify context window k as the critical parameter governing circuit formation: below a minimum threshold (k$\leq$4) the arbitration mechanism cannot form; above it (k$\geq$8), gate confidence scales asymptotically as log k. A two-dimensional phase diagram further reveals task-demand-dependent route arbitration consistent with the prediction that prospective execution becomes advantageous only when prediction error remains within the task tolerance window. Together, these results provide a mechanistic account of how reactive and prospective control strategies emerge and compete during learning. More broadly, this work sharpens mechanistic accounts of cognitive development and provides principled guidance for the design of interpretable embodied agents.

Mechanistic Foundations of Goal-Directed Control

Abstract

Mechanistic interpretability has transformed the analysis of transformer circuits by decomposing model behavior into competing algorithms, identifying phase transitions during training, and deriving closed-form predictions for when and why strategies shift. However, this program has remained largely confined to sequence-prediction architectures, leaving embodied control systems without comparable mechanistic accounts. Here we extend this framework to sensorimotor-cognitive development, using infant motor learning as a model system. We show that foundational inductive biases give rise to causal control circuits, with learned gating mechanisms converging toward theoretically motivated uncertainty thresholds. The resulting dynamics reveal a clean phase transition in the arbitration gate whose commitment behavior is well described by a closed-form exponential moving-average surrogate. We identify context window k as the critical parameter governing circuit formation: below a minimum threshold (k4) the arbitration mechanism cannot form; above it (k8), gate confidence scales asymptotically as log k. A two-dimensional phase diagram further reveals task-demand-dependent route arbitration consistent with the prediction that prospective execution becomes advantageous only when prediction error remains within the task tolerance window. Together, these results provide a mechanistic account of how reactive and prospective control strategies emerge and compete during learning. More broadly, this work sharpens mechanistic accounts of cognitive development and provides principled guidance for the design of interpretable embodied agents.
Paper Structure (15 sections, 8 equations, 7 figures, 2 tables)

This paper contains 15 sections, 8 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: A parsimonious computational model of infant sensorimotor–cognitive development. The architecture unfolds through three consolidation routes: (a) pre-motor causal learning, where causal representations ($h^*$) form early, even as motor commands are issued in weakly supervised loop; (b) sensorimotor mastery, where a transition function (T) enables coherent sensory integration and prediction-based control; and (c) autonomous behavior, where previously learned components consolidate into self-regulated behavior, with belief ($b$) tracking internal goals. All stages share the same basic modules—sensory (S), control (C), and motor (M)—but differ in representational depth and internal coordination. These correspond to reactive (black), prospective (blue), and associative (green) routes shown below. (d) Developmental timeline maps architectural transitions to infant motor milestones. The nervous system shifts from predominantly afferent processing (birth to 2 months, sensory input shapes behavior) through a transition zone ( 2–4 months) to predominantly efferent processing ( 4 months onward, goal-directed action).
  • Figure 2: Experimental interface. The protocol is framed as a continuous control problem. The observable scene (goal and cursor) and hidden actuation (elbow joint) enforce a decoupled perception–action loop. The agent drives its motor command to reduce the gap between goal and cursor over time. The goal resets to a new random position on every trial. Task demand jointly modulates target distance and tolerance radius. Varying this parameter alongside training duration allows exploration of distinct control regimes (visualized in Appendix \ref{['exp-setup-details']}).
  • Figure 3: MAIA overview. Three consolidation routes organize control: reactive (black), prospective (blue), and associative (green, deferred). As learning progresses, uncertainty gates arbitrate route transitions based on accumulated confidence. Foundational inductive biases (object-centric perception, event-based representations, body schema) provide structural constraints that enable interpretable control pathways.
  • Figure 4: Context window threshold governs arbitration emergence. Phase diagrams across $k \in \{1, 2, 4, 8, 16, 32\}$ show gate confidence over training epochs (x-axis) and task demand (y-axis). Below $k \leq 4$, no arbitration structure forms, gates remain at chance across all conditions (white/gray). At $k=8$ structure begins to emerge; at $k=32$ the phase diagram is fully resolved, with prospective control (blue) dominating at low task demand and reactive control (gray) at high demand. This progression reveals $k$ as the critical architectural parameter governing circuit formation, consistent with theoretical threshold $k \geq K_{\text{steps}} \approx 10$.
  • Figure 5: Gate learning dynamics reveal EMA structure. Confidence trajectories across context windows $k$ (solid lines) converge to exponential moving average predictions (dashed lines). During warm exploration, confidence remains at chance (0.5) across all $k$. Following temperature collapse ($\tau \to 0.001$), commitment emerges with rate governed by $k$, revealing clear two-phase structure analogous to induction head formation in transformers trainingolsson2022context. Window size $k=32$ (bold) exhibits the clearest phase separation.
  • ...and 2 more figures