DouRN: Improving DouZero by Residual Neural Networks

Yiquan Chen; Yingchao Lyu; Di Zhang

DouRN: Improving DouZero by Residual Neural Networks

Yiquan Chen, Yingchao Lyu, Di Zhang

TL;DR

DouDiZhu poses a three-player imperfect-information challenge for DRL, with large state and action spaces and mixed cooperation-competition dynamics. The authors extend DouZero by integrating residual networks and a call-score bidding module to decide landlord status, and they compare two residual designs under multi-role evaluation. Results show that DouRN improves win rates over DouZero and can surpass experienced human players, with deeper residuals providing gains up to a point before training cost becomes prohibitive. The work demonstrates the effectiveness of deep residual architectures and a landlord-bidding decision in complex multi-agent card games, and points to future enhancements via Monte Carlo search and off-policy learning for further efficiency and performance gains.

Abstract

Deep reinforcement learning has made significant progress in games with imperfect information, but its performance in the card game Doudizhu (Chinese Poker/Fight the Landlord) remains unsatisfactory. Doudizhu is different from conventional games as it involves three players and combines elements of cooperation and confrontation, resulting in a large state and action space. In 2021, a Doudizhu program called DouZero\cite{zha2021douzero} surpassed previous models without prior knowledge by utilizing traditional Monte Carlo methods and multilayer perceptrons. Building on this work, our study incorporates residual networks into the model, explores different architectural designs, and conducts multi-role testing. Our findings demonstrate that this model significantly improves the winning rate within the same training time. Additionally, we introduce a call scoring system to assist the agent in deciding whether to become a landlord. With these enhancements, our model consistently outperforms the existing version of DouZero and even experienced human players. \footnote{The source code is available at \url{https://github.com/Yingchaol/Douzero_Resnet.git.}

DouRN: Improving DouZero by Residual Neural Networks

TL;DR

Abstract

Paper Structure (9 sections, 5 figures, 4 tables)

This paper contains 9 sections, 5 figures, 4 tables.

Introduction
Related Work
Methodology
Residual Networks
Bidding System
Experiments
Evaluation on Residual Networks
Evaluation on Bidding System
Conclusion and Future Work

Figures (5)

Figure 1: The architecture of a residual block.
Figure 2: Insert several residual blocks before the MLP.
Figure 3: Replace the original 6-layer MLP into residual blocks.
Figure 4: The input data utilized in the bidding system encompasses the player's hand as well as the scores assigned by the two opposing players. Notably, the scores given by the other players are transformed into a one-hot matrix representation for further analysis and processing.
Figure 5: In the architecture of the bidding system, some neurons will be randomly discarded in dropout without training, and the probability is indicated within parentheses.

DouRN: Improving DouZero by Residual Neural Networks

TL;DR

Abstract

DouRN: Improving DouZero by Residual Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (5)