Neural Network-based Information Set Weighting for Playing Reconnaissance Blind Chess
Timo Bertram, Johannes Fürnkranz, Martin Müller
TL;DR
This work addresses imperfect-information play by learning situation-specific weights for information-set states in Reconnaissance Blind Chess using a Siamese neural network. By embedding the observation history and candidate boards into a shared space, the model produces a weight distribution over possible true states, enabling a weighted combination of perfect-information evaluations for decision making. Empirical results show the Siamese approach outperforms a CNN baseline in predicting the true information-set state and enables an RBC agent that, using Stockfish evaluations weighted by the learned distribution, achieves strong leaderboard performance (rank around #5). The method provides a general mechanism to translate uncertain information sets into tractable, information-efficient planning, with potential applicability to other imperfect-information tasks and future work toward RBC-specific policies and search with learned distributions.
Abstract
In imperfect information games, the game state is generally not fully observable to players. Therefore, good gameplay requires policies that deal with the different information that is hidden from each player. To combat this, effective algorithms often reason about information sets; the sets of all possible game states that are consistent with a player's observations. While there is no way to distinguish between the states within an information set, this property does not imply that all states are equally likely to occur in play. We extend previous research on assigning weights to the states in an information set in order to facilitate better gameplay in the imperfect information game of Reconnaissance Blind Chess. For this, we train two different neural networks which estimate the likelihood of each state in an information set from historical game data. Experimentally, we find that a Siamese neural network is able to achieve higher accuracy and is more efficient than a classical convolutional neural network for the given domain. Finally, we evaluate an RBC-playing agent that is based on the generated weightings and compare different parameter settings that influence how strongly it should rely on them. The resulting best player is ranked 5th on the public leaderboard.
