Table of Contents
Fetching ...

Playing Non-Embedded Card-Based Games with Reinforcement Learning

Tianyang Wu, Lipeng Wan, Yuhang Wang, Qiang Wan, Xuguang Lan

TL;DR

This work tackles the challenge of building non-embedded, real-time AI agents for complex card-based RTS games like Clash Royale, where agents must rely on noisy visual inputs rather than exact game state. The authors propose an offline reinforcement learning framework that uses visual perception outputs, a generative dataset for object detection, and a transformer-based decision model (inspired by Decision Transformer and StARformer) to fuse perception with action, enabling autonomous play on mobile devices. Key contributions include a generative, AI-assisted labeling pipeline for object detection, evaluation of YOLOv8 variants for fast and accurate unit detection, and a delayed, continuous action prediction strategy with resampling to address data imbalances in offline datasets. The results show the approach can defeat built-in AI in Clash Royale and run in real time on mobile hardware, highlighting the viability of non-embedded offline RL for complex, vision-driven RTS tasks and offering a foundation for further online RL and perception-architecture improvements.

Abstract

Significant progress has been made in AI for games, including board games, MOBA, and RTS games. However, complex agents are typically developed in an embedded manner, directly accessing game state information, unlike human players who rely on noisy visual data, leading to unfair competition. Developing complex non-embedded agents remains challenging, especially in card-based RTS games with complex features and large state spaces. We propose a non-embedded offline reinforcement learning training strategy using visual inputs to achieve real-time autonomous gameplay in the RTS game Clash Royale. Due to the lack of a object detection dataset for this game, we designed an efficient generative object detection dataset for training. We extract features using state-of-the-art object detection and optical character recognition models. Our method enables real-time image acquisition, perception feature fusion, decision-making, and control on mobile devices, successfully defeating built-in AI opponents. All code is open-sourced at https://github.com/wty-yy/katacr.

Playing Non-Embedded Card-Based Games with Reinforcement Learning

TL;DR

This work tackles the challenge of building non-embedded, real-time AI agents for complex card-based RTS games like Clash Royale, where agents must rely on noisy visual inputs rather than exact game state. The authors propose an offline reinforcement learning framework that uses visual perception outputs, a generative dataset for object detection, and a transformer-based decision model (inspired by Decision Transformer and StARformer) to fuse perception with action, enabling autonomous play on mobile devices. Key contributions include a generative, AI-assisted labeling pipeline for object detection, evaluation of YOLOv8 variants for fast and accurate unit detection, and a delayed, continuous action prediction strategy with resampling to address data imbalances in offline datasets. The results show the approach can defeat built-in AI in Clash Royale and run in real time on mobile hardware, highlighting the viability of non-embedded offline RL for complex, vision-driven RTS tasks and offering a foundation for further online RL and perception-architecture improvements.

Abstract

Significant progress has been made in AI for games, including board games, MOBA, and RTS games. However, complex agents are typically developed in an embedded manner, directly accessing game state information, unlike human players who rely on noisy visual data, leading to unfair competition. Developing complex non-embedded agents remains challenging, especially in card-based RTS games with complex features and large state spaces. We propose a non-embedded offline reinforcement learning training strategy using visual inputs to achieve real-time autonomous gameplay in the RTS game Clash Royale. Due to the lack of a object detection dataset for this game, we designed an efficient generative object detection dataset for training. We extract features using state-of-the-art object detection and optical character recognition models. Our method enables real-time image acquisition, perception feature fusion, decision-making, and control on mobile devices, successfully defeating built-in AI opponents. All code is open-sourced at https://github.com/wty-yy/katacr.

Paper Structure

This paper contains 22 sections, 8 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: Information Flow Transmission Diagram.
  • Figure 2: Game Scenario.
  • Figure 3: The process of building a generative dataset for object detection.
  • Figure 4: Decision model architecture: A spatial attention mechanism on the left encodes feature information at the same timestep, while a temporal attention mechanism on the right associates information across consecutive frames to predict actions.
  • Figure 5: A segment of data extracted from the offline dataset, containing a total of 5 action frames, with a maximum interval frame threshold $T_{delay} = 20$.
  • ...and 1 more figures