Table of Contents
Fetching ...

Understanding the learned look-ahead behavior of chess neural networks

Diogo Cruz

TL;DR

This work investigates whether the Leela Chess Zero policy network exhibits genuine look-ahead planning beyond the immediate move. By extending mechanistic interpretability methods to longer horizons and multiple branches, the authors demonstrate context-dependent look-ahead up to seven moves and reveal specialized attention heads that encode and manipulate future-state information in a pattern-driven, rather than purely timing-based, manner. The study combines activation patching, probing, and ablation to show that multiple future move sequences are considered and that some heads (notably L12H12 and L12H17) specialize in different tactical contexts (e.g., checkmate vs non-checkmate). These findings advance our understanding of emergent planning-like capabilities in strategic neural networks and illustrate how interpretability techniques can uncover cognitive-like processes in AI systems, with implications for designing robust planning in complex domains.

Abstract

We investigate the look-ahead capabilities of chess-playing neural networks, specifically focusing on the Leela Chess Zero policy network. We build on the work of Jenner et al. (2024) by analyzing the model's ability to consider future moves and alternative sequences beyond the immediate next move. Our findings reveal that the network's look-ahead behavior is highly context-dependent, varying significantly based on the specific chess position. We demonstrate that the model can process information about board states up to seven moves ahead, utilizing similar internal mechanisms across different future time steps. Additionally, we provide evidence that the network considers multiple possible move sequences rather than focusing on a single line of play. These results offer new insights into the emergence of sophisticated look-ahead capabilities in neural networks trained on strategic tasks, contributing to our understanding of AI reasoning in complex domains. Our work also showcases the effectiveness of interpretability techniques in uncovering cognitive-like processes in artificial intelligence systems.

Understanding the learned look-ahead behavior of chess neural networks

TL;DR

This work investigates whether the Leela Chess Zero policy network exhibits genuine look-ahead planning beyond the immediate move. By extending mechanistic interpretability methods to longer horizons and multiple branches, the authors demonstrate context-dependent look-ahead up to seven moves and reveal specialized attention heads that encode and manipulate future-state information in a pattern-driven, rather than purely timing-based, manner. The study combines activation patching, probing, and ablation to show that multiple future move sequences are considered and that some heads (notably L12H12 and L12H17) specialize in different tactical contexts (e.g., checkmate vs non-checkmate). These findings advance our understanding of emergent planning-like capabilities in strategic neural networks and illustrate how interpretability techniques can uncover cognitive-like processes in AI systems, with implications for designing robust planning in complex domains.

Abstract

We investigate the look-ahead capabilities of chess-playing neural networks, specifically focusing on the Leela Chess Zero policy network. We build on the work of Jenner et al. (2024) by analyzing the model's ability to consider future moves and alternative sequences beyond the immediate next move. Our findings reveal that the network's look-ahead behavior is highly context-dependent, varying significantly based on the specific chess position. We demonstrate that the model can process information about board states up to seven moves ahead, utilizing similar internal mechanisms across different future time steps. Additionally, we provide evidence that the network considers multiple possible move sequences rather than focusing on a single line of play. These results offer new insights into the emergence of sophisticated look-ahead capabilities in neural networks trained on strategic tasks, contributing to our understanding of AI reasoning in complex domains. Our work also showcases the effectiveness of interpretability techniques in uncovering cognitive-like processes in artificial intelligence systems.

Paper Structure

This paper contains 37 sections, 1 equation, 34 figures.

Figures (34)

  • Figure 1: Examples of 3-move puzzles in puzzle set 112 (left) and 123 (right). " 1st", " 2nd", and " 3rd" mark the move order, with the green (resp. red) arrow indicating the optimal move of the player (resp. opponent). The board squares the piece moves to are marked in blue. They are listed sequentially starting from 1. The resulting number sequence labels the associated puzzle set, with 1st move $\mapsto$ square 1, 2nd$\mapsto$ sq. 1, 3rd$\mapsto$ sq. 2 resulting in the set 112, for example. For these two examples, the optimal move sequence (i.e. principal variation) results in a checkmate, which may be marked with the prefix M, so these examples additionally belong to the subsets M112 and M123, respectively.
  • Figure 2: Log odds reduction of the correct move as a result of activation patching, for 5-move puzzle sets of the form 112XY, where $Y>2$ (i.e. the fifth move square is distinct from the first and third move squares). "Corrupted" indicates the square where a piece was (re)moved on the corrupted board, compared to the original board. Higher values indicate greater importance of that square for the model's decision. The label $i$ indicates the move square for the $i$-th move, with solid (resp. dashed) lines indicating the destination square for the player (resp. opponent) piece. "Other" indicates the contributions of the remaining squares. Dashed lines indicate opponent moves. Confidence intervals of 50% and 90% are displayed using darker and lighter hues, respectively, indicating the distribution of the log odds reduction accross the puzzles considered. As expected, for the early layers, the original and corrupted board are differentiated by the content of the corrupted square, so its effect dominates. For the final layers, the model has decided which move to make next, so the next move square dominates. For the middle layers, more complex dynamics emerge.
  • Figure 3: Probing the model's residual stream for the puzzle set 1123456. The probe's accuracy decreases as we look into more distant future move squares, with the 7th move square's accuracy being considerably low, but still non-negligible when compared with the probe's accuracy for a random model. The observed accuracy increases as we traverse the model's layers, as the residual stream contains the move information in a way that is progressively easier to decode. The sharp dropoff at the last layer likely stems from the model's lack of use of future move information by the policy and value heads, instead relying more strongly on the next move information (see \ref{['sec:probing_effects_extra']}).
  • Figure 4: Attention head patching results for puzzles with 3 moves. Darker tones indicate higher log odds reduction of the correct move. The letters K, B, and R represent the king, bishop, and rook attention heads, respectively, identified in jenner2024evidence. Darker colors mark a higher log odds reduction due to patching, with the highest being 0.73, for L12H12 (head 12 in layer 12) in set 112.
  • Figure 5: Ablation results of the L12H12 head for checkmate (M112, left) and non-checkmate (N112, right) puzzle set 112. We note that head L12H12 not only appears to mainly move information "backward in time", i.e. from the third to the first move square, but it appears to be especially critical in scenarios that explicitly result in a checkmate (in this case, in 3 moves).
  • ...and 29 more figures