Playing Hex and Counter Wargames using Reinforcement Learning and Recurrent Neural Networks
Guilherme Palma, Pedro A. Santos, João Dias
TL;DR
Hex and Counter Wargames present tough adversarial planning on hex grids with large maps and unit stacking. The paper proposes an AlphaZero‑style framework using a novel Recall‑based fully convolutional recurrent network, with tailored state and action representations and a 256‑channel latent space, enabling learning on small maps and extrapolation to larger boards via increased recurrent iterations, as described by $19S + 12(R+1)$ input channels and $(9S + 3) × Height × Width$ action space. Key contributions include the hex‑specific input/output design, a freely available environment and asynchronous AlphaZero implementation, and curriculum learning to handle increasing complexity. Results show strong performance on diverse small‑map scenarios but indicate limited extrapolation to larger maps, motivating future work on architectural refinements (e.g., depthwise/separable convolutions) and adaptive computation approaches to scale AI for Hex/Counter Wargames.
Abstract
Hex and Counter Wargames are adversarial two-player simulations of real military conflicts requiring complex strategic decision-making. Unlike classical board games, these games feature intricate terrain/unit interactions, unit stacking, large maps of varying sizes, and simultaneous move and combat decisions involving hundreds of units. This paper introduces a novel system designed to address the strategic complexity of Hex and Counter Wargames by integrating cutting-edge advancements in Recurrent Neural Networks with AlphaZero, a reliable modern Reinforcement Learning algorithm. The system utilizes a new Neural Network architecture developed from existing research, incorporating innovative state and action representations tailored to these specific game environments. With minimal training, our solution has shown promising results in typical scenarios, demonstrating the ability to generalize across different terrain and tactical situations. Additionally, we explore the system's potential to scale to larger map sizes. The developed system is openly accessible, facilitating continued research and exploration within this challenging domain.
