Table of Contents
Fetching ...

Human-aligned Chess with a Bit of Search

Yiming Zhang, Athul Paul Jacob, Vivian Lai, Daniel Fried, Daphne Ippolito

TL;DR

Allie presents a human-aligned chess AI trained on real-game trajectories to capture not only moves but also human-like pondering and resignation behaviors. A decoder-only Transformer predicts moves, think-time, and board value, with a time-adaptive Monte-Carlo Tree Search at inference to calibrate strength across Elo levels. In a large-scale online study, Allie with adaptive search achieves near-parity with players spanning 1000–2600 Elo and attains a mean skill-calibration error around 49 Elo, demonstrating that human-like behavior and strength can be learned solely from human data. This work advances human-AI collaboration by integrating human-inspired reasoning into search and learning for a classic, complex decision task.

Abstract

Chess has long been a testbed for AI's quest to match human intelligence, and in recent years, chess AI systems have surpassed the strongest humans at the game. However, these systems are not human-aligned; they are unable to match the skill levels of all human partners or model human-like behaviors beyond piece movement. In this paper, we introduce Allie, a chess-playing AI designed to bridge the gap between artificial and human intelligence in this classic game. Allie is trained on log sequences of real chess games to model the behaviors of human chess players across the skill spectrum, including non-move behaviors such as pondering times and resignations In offline evaluations, we find that Allie exhibits humanlike behavior: it outperforms the existing state-of-the-art in human chess move prediction and "ponders" at critical positions. The model learns to reliably assign reward at each game state, which can be used at inference as a reward function in a novel time-adaptive Monte-Carlo tree search (MCTS) procedure, where the amount of search depends on how long humans would think in the same positions. Adaptive search enables remarkable skill calibration; in a large-scale online evaluation against players with ratings from 1000 to 2600 Elo, our adaptive search method leads to a skill gap of only 49 Elo on average, substantially outperforming search-free and standard MCTS baselines. Against grandmaster-level (2500 Elo) opponents, Allie with adaptive search exhibits the strength of a fellow grandmaster, all while learning exclusively from humans.

Human-aligned Chess with a Bit of Search

TL;DR

Allie presents a human-aligned chess AI trained on real-game trajectories to capture not only moves but also human-like pondering and resignation behaviors. A decoder-only Transformer predicts moves, think-time, and board value, with a time-adaptive Monte-Carlo Tree Search at inference to calibrate strength across Elo levels. In a large-scale online study, Allie with adaptive search achieves near-parity with players spanning 1000–2600 Elo and attains a mean skill-calibration error around 49 Elo, demonstrating that human-like behavior and strength can be learned solely from human data. This work advances human-AI collaboration by integrating human-inspired reasoning into search and learning for a classic, complex decision task.

Abstract

Chess has long been a testbed for AI's quest to match human intelligence, and in recent years, chess AI systems have surpassed the strongest humans at the game. However, these systems are not human-aligned; they are unable to match the skill levels of all human partners or model human-like behaviors beyond piece movement. In this paper, we introduce Allie, a chess-playing AI designed to bridge the gap between artificial and human intelligence in this classic game. Allie is trained on log sequences of real chess games to model the behaviors of human chess players across the skill spectrum, including non-move behaviors such as pondering times and resignations In offline evaluations, we find that Allie exhibits humanlike behavior: it outperforms the existing state-of-the-art in human chess move prediction and "ponders" at critical positions. The model learns to reliably assign reward at each game state, which can be used at inference as a reward function in a novel time-adaptive Monte-Carlo tree search (MCTS) procedure, where the amount of search depends on how long humans would think in the same positions. Adaptive search enables remarkable skill calibration; in a large-scale online evaluation against players with ratings from 1000 to 2600 Elo, our adaptive search method leads to a skill gap of only 49 Elo on average, substantially outperforming search-free and standard MCTS baselines. Against grandmaster-level (2500 Elo) opponents, Allie with adaptive search exhibits the strength of a fellow grandmaster, all while learning exclusively from humans.
Paper Structure (29 sections, 3 equations, 10 figures, 9 tables)

This paper contains 29 sections, 3 equations, 10 figures, 9 tables.

Figures (10)

  • Figure 1: (a) The current game state can be represented as the sequence of moves that produced it. This sequence, which also includes metadata on the players' skill and the time setting (e.g. a blitz game), is inputted to a Transformer, which predicts the next move, pondering time for this move, and a value assessment of the move. (b) At inference time, we employee Monte-Carlo Tree Search with the value predictions from the model. The number of rollouts $N_\mathrm{sim}$ is chosen dynamically based on the predicted pondering time.
  • Figure 2: Adaptive search enables matching human moves at expert levels. Move-matching accuracy of Allie-Policy, Allie-Adaptive-Search, Maia and GPT-3.5 are reported across skill levels. Allie-Search has virtually the same move matching accuracy as Allie-Adaptive-Search and is omitted from the figure.
  • Figure 3: Allie's time predictions are strongly correlated with ground-truth human time usage. In the figure, we show median and interquartile range of Allie's predicted think time for different amount of time spent by humans and observe a clear monotonic relationship.
  • Figure 4: Allie learns to assign reliable value estimates to board states by observing game outcomes alone. We report Pearson's $r$ correlation of value estimates by Allie and Stockfish with game outcomes. Game outcomes are increasingly predictable as the game progresses.
  • Figure 5: Search-free methods fail to match skill level of strong players. We estimate difference in strength of various systems to online players. Values close to 0 indicate good skill calibration.
  • ...and 5 more figures