Table of Contents
Fetching ...

Learning to Beat ByteRL: Exploitability of Collectible Card Game Agents

Radovan Haluska, Martin Schmid

TL;DR

The paper analyzes the exploitability of ByteRL in Legends of Code and Magic by building a powerful attacker through behaviour cloning and subsequent reinforcement learning fine-tuning. It demonstrates that ByteRL, while strong, is vulnerable in the LOCM battle phase, and that an end-to-end trained clone can approach or match ByteRL's strength with sufficient data. Behavior cloning alone yields near-parity with ByteRL, and PPO-based fine-tuning on fixed deck pools enables beating ByteRL across hundreds of decks, with pretraining substantially accelerating learning. The results highlight practical vulnerabilities in state-of-the-art CCG agents and point to future work on draft-stage independence, curriculum learning, and larger-scale architectures for robust generalization.

Abstract

While Poker, as a family of games, has been studied extensively in the last decades, collectible card games have seen relatively little attention. Only recently have we seen an agent that can compete with professional human players in Hearthstone, one of the most popular collectible card games. Although artificial agents must be able to work with imperfect information in both of these genres, collectible card games pose another set of distinct challenges. Unlike in many poker variants, agents must deal with state space so vast that even enumerating all states consistent with the agent's beliefs is intractable, rendering the current search methods unusable and requiring the agents to opt for other techniques. In this paper, we investigate the strength of such techniques for this class of games. Namely, we present preliminary analysis results of ByteRL, the state-of-the-art agent in Legends of Code and Magic and Hearthstone. Although ByteRL beat a top-10 Hearthstone player from China, we show that its play in Legends of Code and Magic is highly exploitable.

Learning to Beat ByteRL: Exploitability of Collectible Card Game Agents

TL;DR

The paper analyzes the exploitability of ByteRL in Legends of Code and Magic by building a powerful attacker through behaviour cloning and subsequent reinforcement learning fine-tuning. It demonstrates that ByteRL, while strong, is vulnerable in the LOCM battle phase, and that an end-to-end trained clone can approach or match ByteRL's strength with sufficient data. Behavior cloning alone yields near-parity with ByteRL, and PPO-based fine-tuning on fixed deck pools enables beating ByteRL across hundreds of decks, with pretraining substantially accelerating learning. The results highlight practical vulnerabilities in state-of-the-art CCG agents and point to future work on draft-stage independence, curriculum learning, and larger-scale architectures for robust generalization.

Abstract

While Poker, as a family of games, has been studied extensively in the last decades, collectible card games have seen relatively little attention. Only recently have we seen an agent that can compete with professional human players in Hearthstone, one of the most popular collectible card games. Although artificial agents must be able to work with imperfect information in both of these genres, collectible card games pose another set of distinct challenges. Unlike in many poker variants, agents must deal with state space so vast that even enumerating all states consistent with the agent's beliefs is intractable, rendering the current search methods unusable and requiring the agents to opt for other techniques. In this paper, we investigate the strength of such techniques for this class of games. Namely, we present preliminary analysis results of ByteRL, the state-of-the-art agent in Legends of Code and Magic and Hearthstone. Although ByteRL beat a top-10 Hearthstone player from China, we show that its play in Legends of Code and Magic is highly exploitable.
Paper Structure (16 sections, 3 figures, 1 table)

This paper contains 16 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: The figure shows training, validation and win rate curves for selected behaviour cloning experiments. For the first two plots, the horizontal axis shows the training iteration, and the vertical axis shows the accuracy. For the last plot, the horizontal axis shows the evaluation round, and the vertical axis shows the win rate against ByteRL.
  • Figure 2: The figure shows the results of the evaluations against ByteRL using the best-performing checkpoint for each behaviour-cloning experiment. The horizontal axis contains the names of the experiments, and the vertical axis shows the win rates.
  • Figure 3: The figure shows the development of win rates during the training. For each deck pool size, two curves with 95% confidence intervals are shown, one for the case where the policy network was initialized with the pre-trained weights and one where the weights were initialized randomly. Each curve is an average of at most five runs. As the training on some of the seeds finished earlier than on others, the vertical bars denote the place where the number of runs decreased by one.