Table of Contents
Fetching ...

Transformer Based Planning in the Observation Space with Applications to Trick Taking Card Games

Douglas Rebstock, Christopher Solinas, Nathan R. Sturtevant, Michael Buro

TL;DR

This work tackles planning under imperfect information in trick-taking card games by proposing GO-MCTS, a method that performs Monte Carlo Tree Search in the space of observation histories using a transformer-based generative model to simulate next observations. The transformer is trained via neural fictitious self-play in a population-based setting, enabling iterative policy improvement without requiring access to the true hidden state. Across Hearts, Skat, and The Crew, GO-MCTS achieves state-of-the-art performance in Hearts and The Crew, while illustrating trade-offs between strength and computation time in more complex domains like Skat. The approach demonstrates the viability of combining observation-space planning with transformer-based dynamics for large, uncertain domains and points to future improvements in efficiency and generalization to broader imperfect-information tasks.

Abstract

Traditional search algorithms have issues when applied to games of imperfect information where the number of possible underlying states and trajectories are very large. This challenge is particularly evident in trick-taking card games. While state sampling techniques such as Perfect Information Monte Carlo (PIMC) search has shown success in these contexts, they still have major limitations. We present Generative Observation Monte Carlo Tree Search (GO-MCTS), which utilizes MCTS on observation sequences generated by a game specific model. This method performs the search within the observation space and advances the search using a model that depends solely on the agent's observations. Additionally, we demonstrate that transformers are well-suited as the generative model in this context, and we demonstrate a process for iteratively training the transformer via population-based self-play. The efficacy of GO-MCTS is demonstrated in various games of imperfect information, such as Hearts, Skat, and "The Crew: The Quest for Planet Nine," with promising results.

Transformer Based Planning in the Observation Space with Applications to Trick Taking Card Games

TL;DR

This work tackles planning under imperfect information in trick-taking card games by proposing GO-MCTS, a method that performs Monte Carlo Tree Search in the space of observation histories using a transformer-based generative model to simulate next observations. The transformer is trained via neural fictitious self-play in a population-based setting, enabling iterative policy improvement without requiring access to the true hidden state. Across Hearts, Skat, and The Crew, GO-MCTS achieves state-of-the-art performance in Hearts and The Crew, while illustrating trade-offs between strength and computation time in more complex domains like Skat. The approach demonstrates the viability of combining observation-space planning with transformer-based dynamics for large, uncertain domains and points to future improvements in efficiency and generalization to broader imperfect-information tasks.

Abstract

Traditional search algorithms have issues when applied to games of imperfect information where the number of possible underlying states and trajectories are very large. This challenge is particularly evident in trick-taking card games. While state sampling techniques such as Perfect Information Monte Carlo (PIMC) search has shown success in these contexts, they still have major limitations. We present Generative Observation Monte Carlo Tree Search (GO-MCTS), which utilizes MCTS on observation sequences generated by a game specific model. This method performs the search within the observation space and advances the search using a model that depends solely on the agent's observations. Additionally, we demonstrate that transformers are well-suited as the generative model in this context, and we demonstrate a process for iteratively training the transformer via population-based self-play. The efficacy of GO-MCTS is demonstrated in various games of imperfect information, such as Hearts, Skat, and "The Crew: The Quest for Planet Nine," with promising results.
Paper Structure (27 sections, 5 equations, 3 figures, 6 tables, 3 algorithms)

This paper contains 27 sections, 5 equations, 3 figures, 6 tables, 3 algorithms.

Figures (3)

  • Figure 1: An example of an imperfect information game. $P_2$ cannot distinguish between states inside the information set (dotted rectangle) since they could not observe the private move made by $P_1$.
  • Figure 2: Average success rate for each mission in each training iteration (not using search).
  • Figure 3: Average success for the final iteration of the ArgmaxVal* and GO-MCTS players. Results broken down for each mission.