Table of Contents
Fetching ...

Enhancing Player Enjoyment with a Two-Tier DRL and LLM-Based Agent System for Fighting Games

Shouren Wang, Zehua Jiang, Fernando Sliva, Sam Earle, Julian Togelius

TL;DR

This work introduces a two-tier agent (TTA) framework to boost player enjoyment in a traditional fighting game, Street Fighter II Champion Edition. The first tier builds diverse DRL agents via a modularized reward function and hybrid self-play training, enabling distinct play styles and mastery of advanced techniques. The second tier employs a Large Language Model hyper-agent (LLMHA) to dynamically select opponents based on players' data and feedback, enhancing matchmaking diversity and engagement. Experimental results show substantial gains in advanced move usage (up to 156.36%) and notable improvements in player enjoyment metrics from a small user study, supporting the practical value of integrating DRL with reasoning-enabled hyper-agents for real-time opponent selection. The study highlights both the promise and challenges of modeling timing-sensitive skills and constructing robust, scalable prompts for LLM-based decision making in interactive games.

Abstract

Deep reinforcement learning (DRL) has effectively enhanced gameplay experiences and game design across various game genres. However, few studies on fighting game agents have focused explicitly on enhancing player enjoyment, a critical factor for both developers and players. To address this gap and establish a practical baseline for designing enjoyability-focused agents, we propose a two-tier agent (TTA) system and conducted experiments in the classic fighting game Street Fighter II. The first tier of TTA employs a task-oriented network architecture, modularized reward functions, and hybrid training to produce diverse and skilled DRL agents. In the second tier of TTA, a Large Language Model Hyper-Agent, leveraging players' playing data and feedback, dynamically selects suitable DRL opponents. In addition, we investigate and model several key factors that affect the enjoyability of the opponent. The experiments demonstrate improvements from 64. 36% to 156. 36% in the execution of advanced skills over baseline methods. The trained agents also exhibit distinct game-playing styles. Additionally, we conducted a small-scale user study, and the overall enjoyment in the player's feedback validates the effectiveness of our TTA system.

Enhancing Player Enjoyment with a Two-Tier DRL and LLM-Based Agent System for Fighting Games

TL;DR

This work introduces a two-tier agent (TTA) framework to boost player enjoyment in a traditional fighting game, Street Fighter II Champion Edition. The first tier builds diverse DRL agents via a modularized reward function and hybrid self-play training, enabling distinct play styles and mastery of advanced techniques. The second tier employs a Large Language Model hyper-agent (LLMHA) to dynamically select opponents based on players' data and feedback, enhancing matchmaking diversity and engagement. Experimental results show substantial gains in advanced move usage (up to 156.36%) and notable improvements in player enjoyment metrics from a small user study, supporting the practical value of integrating DRL with reasoning-enabled hyper-agents for real-time opponent selection. The study highlights both the promise and challenges of modeling timing-sensitive skills and constructing robust, scalable prompts for LLM-based decision making in interactive games.

Abstract

Deep reinforcement learning (DRL) has effectively enhanced gameplay experiences and game design across various game genres. However, few studies on fighting game agents have focused explicitly on enhancing player enjoyment, a critical factor for both developers and players. To address this gap and establish a practical baseline for designing enjoyability-focused agents, we propose a two-tier agent (TTA) system and conducted experiments in the classic fighting game Street Fighter II. The first tier of TTA employs a task-oriented network architecture, modularized reward functions, and hybrid training to produce diverse and skilled DRL agents. In the second tier of TTA, a Large Language Model Hyper-Agent, leveraging players' playing data and feedback, dynamically selects suitable DRL opponents. In addition, we investigate and model several key factors that affect the enjoyability of the opponent. The experiments demonstrate improvements from 64. 36% to 156. 36% in the execution of advanced skills over baseline methods. The trained agents also exhibit distinct game-playing styles. Additionally, we conducted a small-scale user study, and the overall enjoyment in the player's feedback validates the effectiveness of our TTA system.

Paper Structure

This paper contains 39 sections, 5 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Street Fighter II Game Interface: Champion Edition. The labeled Player 1 (left) and Player 2 (right) indicate the characters controlled by each player; health point (HP) bars display each player's remaining health, with yellow representing the current HP and red indicating lost HP; round time in the center determining the remaining duration of the round; match information at the top provides the match information, for example, score.
  • Figure 2: Overview of the TTA system: (a) Network architecture of the DRL agent; (b) LLM Hyper-Agent opponent selection process. (a) The network processes three types of inputs: game pixels, scalar information, and an action history spanning the past $100$ steps. The extracted features are fed into an actor-critic network, where the actor net (MLP) produces a 12-dimensional multi-binary action probability distribution, and the critic net estimates the value function (used only during training). The feature extractor consists of a CNN module (ResNet18) for visual feature extraction and an RNN module (LSTM) for learning sequential dependencies, particularly for executing special moves. Scalar information, including character ID and game states, is concatenated with the extracted features before being processed by the actor-critic network. (b) LLMHA selection pipeline, which dynamically selects DRL agent opponents for the player based on their match feedback and playing history. The GM maintains a record of the player's playing data (e.g., win rate, previous opponents) and, after each match, prompts the player for feedback. This feedback is then integrated into the playing data and passed to the LLMHA. The LLMHA embeds the playing data into a prompt template and uses it to infer the most suitable opponent, ensuring an adaptive and personalized experience.
  • Figure 3: The figure shows the average number of special moves performed per round by DRL agents trained with three approaches. The "baseline" method uses a CNN+MLP architecture with HP-based rewardssf_go2023phase The "default reward" method adopts our proposed DRL network while keeping HP-based rewards. The "special move reward" method further incorporates a reward terms tailored for special moves (see Section \ref{['method: reward_function']}). Compared to the Baseline, the "default reward" method improves special move usage by 64. 36%; the "Special Move Reward" method achieves a 156. 36% increase, demonstrating the effectiveness of our network architecture and reward design.
  • Figure 4: Win rate comparison between our DRL agent and the baseline model sf_go2023phase. The figure presents the results of 12 matches. Regardless of the reward function used, our model consistently outperformed the baseline with a 66.7% win rate. This demonstrates that our network architecture significantly improves the DRL agent's performance in competitive matches.
  • Figure 5: Comparison of key playing behaviors between DRL agents trained with the default reward and the defensive reward. The figure presents two behavioral metrics: average distance from the opponent and projectile usage rate. The defensive reward agent maintains a noticeably greater distance from the opponent and uses projectiles at a slightly higher rate compared to the default reward agent. This demonstrates that our modularized reward function, incorporating customized reward terms, effectively guides DRL agents to learn the intended distinct play style.
  • ...and 3 more figures