Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
Erik Jenner, Shreyas Kapur, Vasil Georgiev, Cameron Allen, Scott Emmons, Stuart Russell
TL;DR
The paper probes whether a chess-playing neural network learns look-ahead algorithms rather than relying solely on heuristics. By analyzing Leela Chess Zero's transformer-based policy network with activation patching, attention-head ablations, and a bilinear probing approach, it shows that representations of future moves—especially the 3rd move's target square—causally influence current decisions. A key finding is a 92% accurate bilinear probe that predicts the 3rd move two turns ahead, providing an existence proof of learned look-ahead in a real-world model. The work also identifies temporal information flow through specific attention heads and discusses limitations and potential generalizations to other domains.
Abstract
Do neural networks learn to implement algorithms such as look-ahead or search "in the wild"? Or do they rely purely on collections of simple heuristics? We present evidence of learned look-ahead in the policy network of Leela Chess Zero, the currently strongest neural chess engine. We find that Leela internally represents future optimal moves and that these representations are crucial for its final output in certain board states. Concretely, we exploit the fact that Leela is a transformer that treats every chessboard square like a token in language models, and give three lines of evidence (1) activations on certain squares of future moves are unusually important causally; (2) we find attention heads that move important information "forward and backward in time," e.g., from squares of future moves to squares of earlier ones; and (3) we train a simple probe that can predict the optimal move 2 turns ahead with 92% accuracy (in board states where Leela finds a single best line). These findings are an existence proof of learned look-ahead in neural networks and might be a step towards a better understanding of their capabilities.
