Table of Contents
Fetching ...

Mastering the Game of Guandan with Deep Reinforcement Learning and Behavior Regulating

Yifan Yanggong, Hao Pan, Lei Wang

TL;DR

This work addresses mastering Guandan, a challenging imperfect-information card game, by introducing GuanZero, a Deep Monte-Carlo framework augmented with a novel state-action encoding that regulates cooperative behaviors. The approach combines distributed learning, LSTM-based history encoding, and a six-layer feedforward network to estimate $Q(s,a)$ and guide decision-making. Empirical results show GuanZero outperforms random and rule-based baselines and benefits notably from explicit behavior regulation (cooperating, dwarfing, assisting), with training converging in under a week. The study advances AI in complex, multi-agent, imperfect-information domains and points to future enhancements in behavior automation and tribute-strategy learning.

Abstract

Games are a simplified model of reality and often serve as a favored platform for Artificial Intelligence (AI) research. Much of the research is concerned with game-playing agents and their decision making processes. The game of Guandan (literally, "throwing eggs") is a challenging game where even professional human players struggle to make the right decision at times. In this paper we propose a framework named GuanZero for AI agents to master this game using Monte-Carlo methods and deep neural networks. The main contribution of this paper is about regulating agents' behavior through a carefully designed neural network encoding scheme. We then demonstrate the effectiveness of the proposed framework by comparing it with state-of-the-art approaches.

Mastering the Game of Guandan with Deep Reinforcement Learning and Behavior Regulating

TL;DR

This work addresses mastering Guandan, a challenging imperfect-information card game, by introducing GuanZero, a Deep Monte-Carlo framework augmented with a novel state-action encoding that regulates cooperative behaviors. The approach combines distributed learning, LSTM-based history encoding, and a six-layer feedforward network to estimate and guide decision-making. Empirical results show GuanZero outperforms random and rule-based baselines and benefits notably from explicit behavior regulation (cooperating, dwarfing, assisting), with training converging in under a week. The study advances AI in complex, multi-agent, imperfect-information domains and points to future enhancements in behavior automation and tribute-strategy learning.

Abstract

Games are a simplified model of reality and often serve as a favored platform for Artificial Intelligence (AI) research. Much of the research is concerned with game-playing agents and their decision making processes. The game of Guandan (literally, "throwing eggs") is a challenging game where even professional human players struggle to make the right decision at times. In this paper we propose a framework named GuanZero for AI agents to master this game using Monte-Carlo methods and deep neural networks. The main contribution of this paper is about regulating agents' behavior through a carefully designed neural network encoding scheme. We then demonstrate the effectiveness of the proposed framework by comparing it with state-of-the-art approaches.
Paper Structure (20 sections, 9 figures, 10 tables)

This paper contains 20 sections, 9 figures, 10 tables.

Figures (9)

  • Figure 1: Indexing scheme of cards
  • Figure 2: State representation of cards
  • Figure 3: Network architecture of GuanZero
  • Figure 4: The distributed learning process of GuanZero
  • Figure 5: History of WR achieved by GuanZero agents playing against DouZero-based ones
  • ...and 4 more figures