Table of Contents
Fetching ...

Neural Network-based Information Set Weighting for Playing Reconnaissance Blind Chess

Timo Bertram, Johannes Fürnkranz, Martin Müller

TL;DR

This work addresses imperfect-information play by learning situation-specific weights for information-set states in Reconnaissance Blind Chess using a Siamese neural network. By embedding the observation history and candidate boards into a shared space, the model produces a weight distribution over possible true states, enabling a weighted combination of perfect-information evaluations for decision making. Empirical results show the Siamese approach outperforms a CNN baseline in predicting the true information-set state and enables an RBC agent that, using Stockfish evaluations weighted by the learned distribution, achieves strong leaderboard performance (rank around #5). The method provides a general mechanism to translate uncertain information sets into tractable, information-efficient planning, with potential applicability to other imperfect-information tasks and future work toward RBC-specific policies and search with learned distributions.

Abstract

In imperfect information games, the game state is generally not fully observable to players. Therefore, good gameplay requires policies that deal with the different information that is hidden from each player. To combat this, effective algorithms often reason about information sets; the sets of all possible game states that are consistent with a player's observations. While there is no way to distinguish between the states within an information set, this property does not imply that all states are equally likely to occur in play. We extend previous research on assigning weights to the states in an information set in order to facilitate better gameplay in the imperfect information game of Reconnaissance Blind Chess. For this, we train two different neural networks which estimate the likelihood of each state in an information set from historical game data. Experimentally, we find that a Siamese neural network is able to achieve higher accuracy and is more efficient than a classical convolutional neural network for the given domain. Finally, we evaluate an RBC-playing agent that is based on the generated weightings and compare different parameter settings that influence how strongly it should rely on them. The resulting best player is ranked 5th on the public leaderboard.

Neural Network-based Information Set Weighting for Playing Reconnaissance Blind Chess

TL;DR

This work addresses imperfect-information play by learning situation-specific weights for information-set states in Reconnaissance Blind Chess using a Siamese neural network. By embedding the observation history and candidate boards into a shared space, the model produces a weight distribution over possible true states, enabling a weighted combination of perfect-information evaluations for decision making. Empirical results show the Siamese approach outperforms a CNN baseline in predicting the true information-set state and enables an RBC agent that, using Stockfish evaluations weighted by the learned distribution, achieves strong leaderboard performance (rank around #5). The method provides a general mechanism to translate uncertain information sets into tractable, information-efficient planning, with potential applicability to other imperfect-information tasks and future work toward RBC-specific policies and search with learned distributions.

Abstract

In imperfect information games, the game state is generally not fully observable to players. Therefore, good gameplay requires policies that deal with the different information that is hidden from each player. To combat this, effective algorithms often reason about information sets; the sets of all possible game states that are consistent with a player's observations. While there is no way to distinguish between the states within an information set, this property does not imply that all states are equally likely to occur in play. We extend previous research on assigning weights to the states in an information set in order to facilitate better gameplay in the imperfect information game of Reconnaissance Blind Chess. For this, we train two different neural networks which estimate the likelihood of each state in an information set from historical game data. Experimentally, we find that a Siamese neural network is able to achieve higher accuracy and is more efficient than a classical convolutional neural network for the given domain. Finally, we evaluate an RBC-playing agent that is based on the generated weightings and compare different parameter settings that influence how strongly it should rely on them. The resulting best player is ranked 5th on the public leaderboard.
Paper Structure (28 sections, 2 equations, 7 figures, 2 tables, 1 algorithm)

This paper contains 28 sections, 2 equations, 7 figures, 2 tables, 1 algorithm.

Figures (7)

  • Figure 1: Schematic overview of using a Siamese neural network for weighting RBC board states in an information set. The observation is preprocessed by an observation encoding network, while both the real board and a sampled incorrect board are preprocessed by a board encoding network. Next, all are input into the Siamese network. The distances of the outputs in the embedding space model the probabilities of boards being true given the observations.
  • Figure 2: Example black-to-move position, where the best overall move is suboptimal on either board.
  • Figure 3: Process of generating a weighting over an information set. All boards and the current history of observations are fed into the Siamese Neural Network. The distances in the embedding space between the boards and the observations are computed and used to yield a weighting over the boards.
  • Figure 4: Comparison of top-k-percent accuracy of the Siamese network to random, Stockfish, StrangeFish2, and CNN ranking. The average size of the information set is 1100. The Siamese network vastly outperforms the three non-neural network baselines, achieving much higher accuracies than all of those metrics. The CNN performs better than the other baselines, but the Siamese network is able to create better rankings by a noticeable margin. The CNN is able to correctly identify the correct board with an accuracy of 48.82% while the Siamese network achieves 52.91%.
  • Figure 5: Comparison of pick-distance of the true board when using the Siamese network, the classical CNN, Stockfish, StrangeFish2, and random ranking. Individual samples per method have small uniform noise on the x-axis added for better visualisation. The Siamese network has some amount of outliers due to the stochastic nature of the task, but is generally able to achieve a high ranking of the correct choice and a median rank of 1.
  • ...and 2 more figures