Table of Contents
Fetching ...

Playing Hex and Counter Wargames using Reinforcement Learning and Recurrent Neural Networks

Guilherme Palma, Pedro A. Santos, João Dias

TL;DR

Hex and Counter Wargames present tough adversarial planning on hex grids with large maps and unit stacking. The paper proposes an AlphaZero‑style framework using a novel Recall‑based fully convolutional recurrent network, with tailored state and action representations and a 256‑channel latent space, enabling learning on small maps and extrapolation to larger boards via increased recurrent iterations, as described by $19S + 12(R+1)$ input channels and $(9S + 3) × Height × Width$ action space. Key contributions include the hex‑specific input/output design, a freely available environment and asynchronous AlphaZero implementation, and curriculum learning to handle increasing complexity. Results show strong performance on diverse small‑map scenarios but indicate limited extrapolation to larger maps, motivating future work on architectural refinements (e.g., depthwise/separable convolutions) and adaptive computation approaches to scale AI for Hex/Counter Wargames.

Abstract

Hex and Counter Wargames are adversarial two-player simulations of real military conflicts requiring complex strategic decision-making. Unlike classical board games, these games feature intricate terrain/unit interactions, unit stacking, large maps of varying sizes, and simultaneous move and combat decisions involving hundreds of units. This paper introduces a novel system designed to address the strategic complexity of Hex and Counter Wargames by integrating cutting-edge advancements in Recurrent Neural Networks with AlphaZero, a reliable modern Reinforcement Learning algorithm. The system utilizes a new Neural Network architecture developed from existing research, incorporating innovative state and action representations tailored to these specific game environments. With minimal training, our solution has shown promising results in typical scenarios, demonstrating the ability to generalize across different terrain and tactical situations. Additionally, we explore the system's potential to scale to larger map sizes. The developed system is openly accessible, facilitating continued research and exploration within this challenging domain.

Playing Hex and Counter Wargames using Reinforcement Learning and Recurrent Neural Networks

TL;DR

Hex and Counter Wargames present tough adversarial planning on hex grids with large maps and unit stacking. The paper proposes an AlphaZero‑style framework using a novel Recall‑based fully convolutional recurrent network, with tailored state and action representations and a 256‑channel latent space, enabling learning on small maps and extrapolation to larger boards via increased recurrent iterations, as described by input channels and action space. Key contributions include the hex‑specific input/output design, a freely available environment and asynchronous AlphaZero implementation, and curriculum learning to handle increasing complexity. Results show strong performance on diverse small‑map scenarios but indicate limited extrapolation to larger maps, motivating future work on architectural refinements (e.g., depthwise/separable convolutions) and adaptive computation approaches to scale AI for Hex/Counter Wargames.

Abstract

Hex and Counter Wargames are adversarial two-player simulations of real military conflicts requiring complex strategic decision-making. Unlike classical board games, these games feature intricate terrain/unit interactions, unit stacking, large maps of varying sizes, and simultaneous move and combat decisions involving hundreds of units. This paper introduces a novel system designed to address the strategic complexity of Hex and Counter Wargames by integrating cutting-edge advancements in Recurrent Neural Networks with AlphaZero, a reliable modern Reinforcement Learning algorithm. The system utilizes a new Neural Network architecture developed from existing research, incorporating innovative state and action representations tailored to these specific game environments. With minimal training, our solution has shown promising results in typical scenarios, demonstrating the ability to generalize across different terrain and tactical situations. Additionally, we explore the system's potential to scale to larger map sizes. The developed system is openly accessible, facilitating continued research and exploration within this challenging domain.

Paper Structure

This paper contains 22 sections, 1 equation, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Input and recurrent modules of the Recall architecture.
  • Figure 2: Diagram of the designed solution
  • Figure 3: Representation of an attack where all player one units (green) decide to target the enemy armour (red). Note that each action (blue square) would be taken sequentially.
  • Figure 4: Output module architecture
  • Figure 5: Visualization of the terrain for the asymmetric scenario (left) and symmetric scenario (right).
  • ...and 5 more figures