Advancing DRL Agents in Commercial Fighting Games: Training, Integration, and Agent-Human Alignment

Chen Zhang; Qiang He; Zhou Yuan; Elvis S. Liu; Hong Wang; Jian Zhao; Yang Wang

Advancing DRL Agents in Commercial Fighting Games: Training, Integration, and Agent-Human Alignment

Chen Zhang, Qiang He, Zhou Yuan, Elvis S. Liu, Hong Wang, Jian Zhao, Yang Wang

TL;DR

The paper tackles deploying DRL agents in a large-scale commercial fighting game context, where hundreds of characters and real-time interaction pose training and generalization challenges. It introduces Shūkai, a unified model with Heterogeneous League Training (HELT) that combines three input structures (FIS, QS, FQS) and multi-style rewards to balance competence, generalization, and human alignment. HELT accelerates training and broadens policy coverage, achieving a 22% gain in training efficiency, while generalization remains robust across unseen characters. Real-world deployment in Naruto Mobile demonstrates tangible benefits in player engagement and retention, advancing the practical integration of DRL agents into large-scale commercial games.

Abstract

Deep Reinforcement Learning (DRL) agents have demonstrated impressive success in a wide range of game genres. However, existing research primarily focuses on optimizing DRL competence rather than addressing the challenge of prolonged player interaction. In this paper, we propose a practical DRL agent system for fighting games named Shūkai, which has been successfully deployed to Naruto Mobile, a popular fighting game with over 100 million registered users. Shūkai quantifies the state to enhance generalizability, introducing Heterogeneous League Training (HELT) to achieve balanced competence, generalizability, and training efficiency. Furthermore, Shūkai implements specific rewards to align the agent's behavior with human expectations. Shūkai's ability to generalize is demonstrated by its consistent competence across all characters, even though it was trained on only 13% of them. Additionally, HELT exhibits a remarkable 22% improvement in sample efficiency. Shūkai serves as a valuable training partner for players in Naruto Mobile, enabling them to enhance their abilities and skills.

Advancing DRL Agents in Commercial Fighting Games: Training, Integration, and Agent-Human Alignment

TL;DR

Abstract

Paper Structure (42 sections, 16 equations, 11 figures, 2 tables)

This paper contains 42 sections, 16 equations, 11 figures, 2 tables.

Introduction
Preliminaries
Naruto Mobile
Problem Formulation
Method
Heterogeneous Agents
Heterogeneous League Training
Agent-Human Alignment
Policy Improvement
Experiments
Experimental Setup
Results
Real-world application of Shūkai
Human Evaluation
Agent-Human Alignment
...and 27 more sections

Figures (11)

Figure 1: The interface of Naruto Mobile. The player selects a ninja from the character pool to fight against the opponent. The winning condition is to defeat the opponent. Each ninja has three skill buttons and a punch button, controlled by a virtual joystick. The hitbox and hurt box serve as fundamental game mechanics in Naruto Mobile, although they are not visible to the players. The substitution consumes energy and can be used to counter enemy attacks, creating opportunities for counterattacks. Scrolls and summons are additional skills with special effects, such as providing buffs.
Figure 2: Illustration of HELT. As time progresses, the main agent undergoes continuous training. Once it meets the win rate condition or reaches the timeout, a copy of the main agent is added to the policy pool. The main agent's objective is to defeat all opponents. Simultaneously, the main exploiter engages in battles with the main agent to discover its weaknesses. After meeting the win rate condition or reaching the timeout, the main exploiter is reset, and its copy is added to the policy pool. The league exploiter fights with all agents, and once the win rate condition or timeout is met, its copy is also added to the policy pool, with a 25% probability of being reset. More details can find in \ref{['a_learning_arch']}
Figure 3: The learning structure of FIS, QS, and FQS. FIS uses both ID and attribution information (numerical information) to model self and opponent. QS uses ID information to model self and only attribution information to model opponent. FQS models self and opponent only with attribution information. In this figure, the red dashed box contains FIS, the blue dashed box contains QS, and the green dashed box contains FQS. These features are processed by the network and then concatenated with the environment feature, the concatenated features are used to predict actions by the policy.
Figure 4: (a) Competence of different structures serve as the main agent in HELT, FIS agent outperforms other agents. (b) Generalizability of different structures serves as the main agent in HELT, the competence of the FIS agent dropped by 36%, while the competence of the QS and FQS agents remained stable. (c) Training efficiency of HELT. After 70 hours of training, HELT achieved a 22% improvement in competence compared to the QS network with homogeneous league training (HOLT).
Figure 5: The competence of three different levels of Shūkai compete against human players in-game matches. The Elo scores of different level agents after 30 days are represented in the rightmost column. After 30 days, Shūkai beginner and Shūkai intermediate experience a decrease in Elo score, indicating Shūkai can help players to enhance their abilities. Shūkai advanced maintain its Elo score, suggesting advanced Shūkai is challenging even for the skillful player.
...and 6 more figures

Advancing DRL Agents in Commercial Fighting Games: Training, Integration, and Agent-Human Alignment

TL;DR

Abstract

Advancing DRL Agents in Commercial Fighting Games: Training, Integration, and Agent-Human Alignment

Authors

TL;DR

Abstract

Table of Contents

Figures (11)