Deep Reinforcement Learning for 5*5 Multiplayer Go
Brahim Driss, Jérôme Arjonilla, Hui Wang, Abdallah Saffidine, Tristan Cazenave
TL;DR
This work investigates whether state-of-the-art deep reinforcement learning methods for Go can be adapted to a multiplayer setting. By applying AlphaZero and Descent to a $5\times5$ three-player Go variant and benchmarking against UCT, the authors demonstrate that both approaches can learn strong strategies and outperform baseline search in this multi-agent context. AlphaZero shows rapid early gains but can plateau, whereas Descent provides more consistent improvements and balanced performance across players; cross-play analyses reveal complementary strengths. The findings support the viability of DRL in multiplayer board games and point to future work on larger boards, more agents, and additional multiplayer domains.
Abstract
In recent years, much progress has been made in computer Go and most of the results have been obtained thanks to search algorithms (Monte Carlo Tree Search) and Deep Reinforcement Learning (DRL). In this paper, we propose to use and analyze the latest algorithms that use search and DRL (AlphaZero and Descent algorithms) to automatically learn to play an extended version of the game of Go with more than two players. We show that using search and DRL we were able to improve the level of play, even though there are more than two players.
