Learning to Beat ByteRL: Exploitability of Collectible Card Game Agents
Radovan Haluska, Martin Schmid
TL;DR
The paper analyzes the exploitability of ByteRL in Legends of Code and Magic by building a powerful attacker through behaviour cloning and subsequent reinforcement learning fine-tuning. It demonstrates that ByteRL, while strong, is vulnerable in the LOCM battle phase, and that an end-to-end trained clone can approach or match ByteRL's strength with sufficient data. Behavior cloning alone yields near-parity with ByteRL, and PPO-based fine-tuning on fixed deck pools enables beating ByteRL across hundreds of decks, with pretraining substantially accelerating learning. The results highlight practical vulnerabilities in state-of-the-art CCG agents and point to future work on draft-stage independence, curriculum learning, and larger-scale architectures for robust generalization.
Abstract
While Poker, as a family of games, has been studied extensively in the last decades, collectible card games have seen relatively little attention. Only recently have we seen an agent that can compete with professional human players in Hearthstone, one of the most popular collectible card games. Although artificial agents must be able to work with imperfect information in both of these genres, collectible card games pose another set of distinct challenges. Unlike in many poker variants, agents must deal with state space so vast that even enumerating all states consistent with the agent's beliefs is intractable, rendering the current search methods unusable and requiring the agents to opt for other techniques. In this paper, we investigate the strength of such techniques for this class of games. Namely, we present preliminary analysis results of ByteRL, the state-of-the-art agent in Legends of Code and Magic and Hearthstone. Although ByteRL beat a top-10 Hearthstone player from China, we show that its play in Legends of Code and Magic is highly exploitable.
