Table of Contents
Fetching ...

Lookahead Pathology in Monte-Carlo Tree Search

Khoi P. N. Nguyen, Raghuram Ramanujan

TL;DR

This work investigates whether Monte-Carlo Tree Search with UCT can exhibit lookahead pathology in adversarial settings. It introduces a novel critical win-loss game family with a controllable critical rate $\gamma$ to study how deeper search affects decision quality, enabling both theoretical analysis and scalable experiments. The main theoretical result shows that with $\gamma=1$ and a sufficiently large exploration constant $c$, UCT can be driven to pathological behavior, a finding supported by extensive simulations across parameter settings; reduced $\gamma$ values diminish or eliminate the effect. The study highlights important practical concerns for deploying UCT-based planners and points to future work on generalizing results to $\gamma \neq 1$, tightening the bounds on $c$, and exploring mitigation strategies and real-domain applicability.

Abstract

Monte-Carlo Tree Search (MCTS) is a search paradigm that first found prominence with its success in the domain of computer Go. Early theoretical work established the soundness and convergence bounds for Upper Confidence bounds applied to Trees (UCT), the most popular instantiation of MCTS; however, there remain notable gaps in our understanding of how UCT behaves in practice. In this work, we address one such gap by considering the question of whether UCT can exhibit lookahead pathology in adversarial settings -- a paradoxical phenomenon first observed in Minimax search where greater search effort leads to worse decision-making. We introduce a novel family of synthetic games that offer rich modeling possibilities while remaining amenable to mathematical analysis. Our theoretical and experimental results suggest that UCT is indeed susceptible to pathological behavior in a range of games drawn from this family.

Lookahead Pathology in Monte-Carlo Tree Search

TL;DR

This work investigates whether Monte-Carlo Tree Search with UCT can exhibit lookahead pathology in adversarial settings. It introduces a novel critical win-loss game family with a controllable critical rate to study how deeper search affects decision quality, enabling both theoretical analysis and scalable experiments. The main theoretical result shows that with and a sufficiently large exploration constant , UCT can be driven to pathological behavior, a finding supported by extensive simulations across parameter settings; reduced values diminish or eliminate the effect. The study highlights important practical concerns for deploying UCT-based planners and points to future work on generalizing results to , tightening the bounds on , and exploring mitigation strategies and real-domain applicability.

Abstract

Monte-Carlo Tree Search (MCTS) is a search paradigm that first found prominence with its success in the domain of computer Go. Early theoretical work established the soundness and convergence bounds for Upper Confidence bounds applied to Trees (UCT), the most popular instantiation of MCTS; however, there remain notable gaps in our understanding of how UCT behaves in practice. In this work, we address one such gap by considering the question of whether UCT can exhibit lookahead pathology in adversarial settings -- a paradoxical phenomenon first observed in Minimax search where greater search effort leads to worse decision-making. We introduce a novel family of synthetic games that offer rich modeling possibilities while remaining amenable to mathematical analysis. Our theoretical and experimental results suggest that UCT is indeed susceptible to pathological behavior in a range of games drawn from this family.
Paper Structure (26 sections, 3 theorems, 19 equations, 13 figures)

This paper contains 26 sections, 3 theorems, 19 equations, 13 figures.

Key Result

Theorem 1

In a critical win-loss game with $\gamma=1.0$, UCT with a search budget of $N$ nodes will exhibit lookahead pathology for choices of the exploration parameter $c \geq \sqrt{\frac{N^3}{2 \log{N}}}$, even with access to a perfect heuristic.

Figures (13)

  • Figure 1: An example of a forced node (left) and a choice node (right). Upward-facing triangles represent maximizing nodes while downward-facing triangles represent minimizing nodes.
  • Figure 2: Effect of critical rate ($\gamma$) on game tree structure. White nodes correspond to $+1$ positions, while black nodes correspond to $-1$ positions, with the root node in the center. The tree instances were generated with $\gamma=0.1$, $\gamma=0.5$, and $\gamma=1.0$, from left to right.
  • Figure 3: Histograms of empirical critical rates ($\tilde{\gamma}$) for Chess positions sampled $p=10$ (top row) and $p=36$ (bottom row) plies deep into the game. We sample the positions using both light playouts (left column) and heavy playouts (right column).
  • Figure 4: Distribution of Stockfish 13 static evaluations of $+1$ and $-1$ positions sampled $p=10$ plies deep into Chess. The positions are sampled using both light playouts (left) and heavy playouts (right).
  • Figure 5: Measuring pathological behavior in UCT on critical win-loss games of depth $50$ with $\gamma=0.9$ (left) and $\gamma=1$ (right). The heuristic to guide UCT is constructed from histograms of Stockfish evaluations of positions sampled at depth $10$, using both light and heavy playouts. Each colored line corresponds to an instantiation of UCT with a different exploration constant. The $x$-axis is plotted on a log-scale.
  • ...and 8 more figures

Theorems & Definitions (5)

  • Theorem 1
  • Proposition 1
  • proof
  • Theorem 1
  • proof