Mastering the Game of Guandan with Deep Reinforcement Learning and Behavior Regulating
Yifan Yanggong, Hao Pan, Lei Wang
TL;DR
This work addresses mastering Guandan, a challenging imperfect-information card game, by introducing GuanZero, a Deep Monte-Carlo framework augmented with a novel state-action encoding that regulates cooperative behaviors. The approach combines distributed learning, LSTM-based history encoding, and a six-layer feedforward network to estimate $Q(s,a)$ and guide decision-making. Empirical results show GuanZero outperforms random and rule-based baselines and benefits notably from explicit behavior regulation (cooperating, dwarfing, assisting), with training converging in under a week. The study advances AI in complex, multi-agent, imperfect-information domains and points to future enhancements in behavior automation and tribute-strategy learning.
Abstract
Games are a simplified model of reality and often serve as a favored platform for Artificial Intelligence (AI) research. Much of the research is concerned with game-playing agents and their decision making processes. The game of Guandan (literally, "throwing eggs") is a challenging game where even professional human players struggle to make the right decision at times. In this paper we propose a framework named GuanZero for AI agents to master this game using Monte-Carlo methods and deep neural networks. The main contribution of this paper is about regulating agents' behavior through a carefully designed neural network encoding scheme. We then demonstrate the effectiveness of the proposed framework by comparing it with state-of-the-art approaches.
