Iterative Inference in a Chess-Playing Neural Network
Elias Sandmann, Sebastian Lapuschkin, Wojciech Samek
TL;DR
The paper investigates whether a chess-playing neural network performs iterative, phase-like inference or smooth incremental refinement by extending the logit lens to Leela Chess Zero with a Post-LN transformer. It demonstrates a three-phase progression of capability with depth, including late-layer reversals where safety priors override mid-layer tactical solutions, and introduces concept preference analysis as a complementary interpretability tool. The findings suggest that algorithmic computation and learned priors interact across depth to shape policy, not simply sharpen existing representations. This work provides a concrete, domain-specific case for iterative inference in structured decision-making and offers methodological extensions relevant to transformer-based analysis beyond chess.
Abstract
Do neural networks build their representations through smooth, gradual refinement, or via more complex computational processes? We investigate this by extending the logit lens to analyze the policy network of Leela Chess Zero, a superhuman chess engine. Although playing strength and puzzle-solving ability improve consistently across layers, capability progression occurs in distinct computational phases with move preferences undergoing continuous reevaluation--move rankings remain poorly correlated with final outputs until late, and correct puzzle solutions found in middle layers are sometimes overridden. This late-layer reversal is accompanied by concept preference analyses showing final layers prioritize safety over aggression, suggesting a mechanism by which heuristic priors can override tactical solutions.
