Deep Reinforcement Learning for 5*5 Multiplayer Go

Brahim Driss; Jérôme Arjonilla; Hui Wang; Abdallah Saffidine; Tristan Cazenave

Deep Reinforcement Learning for 5*5 Multiplayer Go

Brahim Driss, Jérôme Arjonilla, Hui Wang, Abdallah Saffidine, Tristan Cazenave

TL;DR

This work investigates whether state-of-the-art deep reinforcement learning methods for Go can be adapted to a multiplayer setting. By applying AlphaZero and Descent to a $5\times5$ three-player Go variant and benchmarking against UCT, the authors demonstrate that both approaches can learn strong strategies and outperform baseline search in this multi-agent context. AlphaZero shows rapid early gains but can plateau, whereas Descent provides more consistent improvements and balanced performance across players; cross-play analyses reveal complementary strengths. The findings support the viability of DRL in multiplayer board games and point to future work on larger boards, more agents, and additional multiplayer domains.

Abstract

In recent years, much progress has been made in computer Go and most of the results have been obtained thanks to search algorithms (Monte Carlo Tree Search) and Deep Reinforcement Learning (DRL). In this paper, we propose to use and analyze the latest algorithms that use search and DRL (AlphaZero and Descent algorithms) to automatically learn to play an extended version of the game of Go with more than two players. We show that using search and DRL we were able to improve the level of play, even though there are more than two players.

Deep Reinforcement Learning for 5*5 Multiplayer Go

TL;DR

This work investigates whether state-of-the-art deep reinforcement learning methods for Go can be adapted to a multiplayer setting. By applying AlphaZero and Descent to a

three-player Go variant and benchmarking against UCT, the authors demonstrate that both approaches can learn strong strategies and outperform baseline search in this multi-agent context. AlphaZero shows rapid early gains but can plateau, whereas Descent provides more consistent improvements and balanced performance across players; cross-play analyses reveal complementary strengths. The findings support the viability of DRL in multiplayer board games and point to future work on larger boards, more agents, and additional multiplayer domains.

Abstract

Paper Structure (12 sections, 2 equations, 3 figures, 4 tables)

This paper contains 12 sections, 2 equations, 3 figures, 4 tables.

Introduction
Multiplayer Go
Deep Reinforcement Learning
Monte Carlo Tree Search
AlphaZero
Network architecture
Warm-Start self-play
Descent
Experimental Results
Training of AlphaZero and Descent.
Black against White and Red.
Conclusion

Figures (3)

Figure 1: A game of Multiplayer Go.
Figure 2: AlphaZero network architecture
Figure 3: Left/Right figure represents AlphaZero/Descent against UCT. In y-axis we observe the average points obtained and in x-axis, we observe the training in hours.

Deep Reinforcement Learning for 5*5 Multiplayer Go

TL;DR

Abstract

Deep Reinforcement Learning for 5*5 Multiplayer Go

Authors

TL;DR

Abstract

Table of Contents

Figures (3)