Table of Contents
Fetching ...

Iterative Inference in a Chess-Playing Neural Network

Elias Sandmann, Sebastian Lapuschkin, Wojciech Samek

TL;DR

The paper investigates whether a chess-playing neural network performs iterative, phase-like inference or smooth incremental refinement by extending the logit lens to Leela Chess Zero with a Post-LN transformer. It demonstrates a three-phase progression of capability with depth, including late-layer reversals where safety priors override mid-layer tactical solutions, and introduces concept preference analysis as a complementary interpretability tool. The findings suggest that algorithmic computation and learned priors interact across depth to shape policy, not simply sharpen existing representations. This work provides a concrete, domain-specific case for iterative inference in structured decision-making and offers methodological extensions relevant to transformer-based analysis beyond chess.

Abstract

Do neural networks build their representations through smooth, gradual refinement, or via more complex computational processes? We investigate this by extending the logit lens to analyze the policy network of Leela Chess Zero, a superhuman chess engine. Although playing strength and puzzle-solving ability improve consistently across layers, capability progression occurs in distinct computational phases with move preferences undergoing continuous reevaluation--move rankings remain poorly correlated with final outputs until late, and correct puzzle solutions found in middle layers are sometimes overridden. This late-layer reversal is accompanied by concept preference analyses showing final layers prioritize safety over aggression, suggesting a mechanism by which heuristic priors can override tactical solutions.

Iterative Inference in a Chess-Playing Neural Network

TL;DR

The paper investigates whether a chess-playing neural network performs iterative, phase-like inference or smooth incremental refinement by extending the logit lens to Leela Chess Zero with a Post-LN transformer. It demonstrates a three-phase progression of capability with depth, including late-layer reversals where safety priors override mid-layer tactical solutions, and introduces concept preference analysis as a complementary interpretability tool. The findings suggest that algorithmic computation and learned priors interact across depth to shape policy, not simply sharpen existing representations. This work provides a concrete, domain-specific case for iterative inference in structured decision-making and offers methodological extensions relevant to transformer-based analysis beyond chess.

Abstract

Do neural networks build their representations through smooth, gradual refinement, or via more complex computational processes? We investigate this by extending the logit lens to analyze the policy network of Leela Chess Zero, a superhuman chess engine. Although playing strength and puzzle-solving ability improve consistently across layers, capability progression occurs in distinct computational phases with move preferences undergoing continuous reevaluation--move rankings remain poorly correlated with final outputs until late, and correct puzzle solutions found in middle layers are sometimes overridden. This late-layer reversal is accompanied by concept preference analyses showing final layers prioritize safety over aggression, suggesting a mechanism by which heuristic priors can override tactical solutions.

Paper Structure

This paper contains 83 sections, 5 equations, 27 figures, 34 tables.

Figures (27)

  • Figure 1: Our extended logit lens reveals progressive policy refinement across transformer layers in Leela Chess Zero. We map intermediate activations to policy distributions for a tactical puzzle. The model's top-ranked move changes at each stage, with the correct solution Ng3+ only emerging as a plausible candidate in the middle layers before becoming the decisive top choice in the final output. Full probabilities are provided in Appendix \ref{['app:probs_example_puzzle']} and additional examples in Appendix \ref{['app:puzzles']}.
  • Figure 2: Puzzle-solving performance across layers, stratified by Elo rating. Red dashed lines mark phase boundaries derived from tournament analysis, indicating phase-specific improvement rates.
  • Figure 3: Layer-wise puzzle-solving performance across network depth showing current solve rate, cumulative and first discoveries, and median probability assigned to principal-variation (PV) moves. Background shading indicates the three computational phases identified in tournament analysis.
  • Figure 4: Representative example of solution forgetting. Left: The correct move Rxg7+ maintains high probability through most layers before dropping, while the losing move Kf1 rises from near-zero to become the top choice. Right: Value head evaluations correctly assess both resulting positions.
  • Figure 5: Mean of expected concept deltas $(\Delta c_\ell)$ over positions across layers, measured in centipawns with $95\% \text{ CI}$. Left: King-safety and threat concepts for the moving and opposing sides. Right: Total, material, and residual evaluations. Shaded regions indicate network phases.
  • ...and 22 more figures