Playing Board Games with the Predict Results of Beam Search Algorithm

Sergey Pastukhov

Playing Board Games with the Predict Results of Beam Search Algorithm

Sergey Pastukhov

TL;DR

PROBS tackles deterministic, two-player perfect-information games by predicting beam-search outcomes through two neural nets: a value network $V_\theta(s)$ for terminal utilities and an action-value network $Q_\phi(s,a)$ for beam-search subtree results. The method trains these networks iteratively via self-play, softmax-based action selection with Dirichlet exploration, and replacing beam-search leaves with $V_\theta$, while using $Q_\phi$ to bias expansion. Unlike MCTS-centric approaches like AlphaZero, PROBS relies on a fixed-depth beam search and demonstrates that limited yet focused exploration can progressively refine the policy and improve performance. Empirical results on Connect Four and additional games show notable improvements in Elo against simple baselines, with the ability to outperform deeper lookahead strategies even when the beam is shallow, though direct comparisons to AlphaZero are left for future work. Overall, PROBS presents a novel framework that couples neural prediction with beam-search planning, offering a potential alternative path for planning in deterministic games and inspiring broader applications.

Abstract

This paper introduces a novel algorithm for two-player deterministic games with perfect information, which we call PROBS (Predict Results of Beam Search). Unlike existing methods that predominantly rely on Monte Carlo Tree Search (MCTS) for decision processes, our approach leverages a simpler beam search algorithm. We evaluate the performance of our algorithm across a selection of board games, where it consistently demonstrates an increased winning ratio against baseline opponents. A key result of this study is that the PROBS algorithm operates effectively, even when the beam search size is considerably smaller than the average number of turns in the game.

Playing Board Games with the Predict Results of Beam Search Algorithm

TL;DR

PROBS tackles deterministic, two-player perfect-information games by predicting beam-search outcomes through two neural nets: a value network

for terminal utilities and an action-value network

for beam-search subtree results. The method trains these networks iteratively via self-play, softmax-based action selection with Dirichlet exploration, and replacing beam-search leaves with

, while using

to bias expansion. Unlike MCTS-centric approaches like AlphaZero, PROBS relies on a fixed-depth beam search and demonstrates that limited yet focused exploration can progressively refine the policy and improve performance. Empirical results on Connect Four and additional games show notable improvements in Elo against simple baselines, with the ability to outperform deeper lookahead strategies even when the beam is shallow, though direct comparisons to AlphaZero are left for future work. Overall, PROBS presents a novel framework that couples neural prediction with beam-search planning, offering a potential alternative path for planning in deterministic games and inspiring broader applications.

Abstract

Paper Structure (6 sections, 2 figures, 2 algorithms)

This paper contains 6 sections, 2 figures, 2 algorithms.

Introduction
Related Work
The PROBS Algorithm
Empirical evaluation
Conclusion, limitations and future work
Configuration

Figures (2)

Figure 1: (left) Training the PROBS algorithm on the Connect Four board game using various model sizes. (right) Training the PROBS algorithm with varying depth limits for beam search.
Figure 2: Training PROBS algorithm on Toguz-Kumalak and Reversi (Othello)

Playing Board Games with the Predict Results of Beam Search Algorithm

TL;DR

Abstract

Playing Board Games with the Predict Results of Beam Search Algorithm

Authors

TL;DR

Abstract

Table of Contents

Figures (2)