Adversarial Reinforcement Learning Framework for ESP Cheater Simulation
Inkyu Park, Jeong-Gwan Lee, Taehwan Kwon, Juheon Choi, Seungku Kim, Junsu Kim, Kimin Lee
TL;DR
The paper tackles the challenge of detecting ESP cheating in games, where ground-truth labels are scarce and cheaters adapt to detectors. It introduces an ESP cheater simulation framework that co-evolves a cheater, a non-cheater, and a trajectory-based detector within a minimax reinforcement learning setup, including a structured cheater model that can switch between cheating and non-cheating based on detection risk. Through Gridworld and Blackjack experiments, the study demonstrates that adversarial training yields adaptive cheating behaviors that balance reward optimization and evasion, while detectors improve or degrade accordingly, revealing vulnerabilities of static detectors. The framework offers a controllable platform to study adaptive cheating and to develop more robust cheat detectors, with extensions to more complex games and multi-agent settings envisioned for future work.
Abstract
Extra-Sensory Perception (ESP) cheats, which reveal hidden in-game information such as enemy locations, are difficult to detect because their effects are not directly observable in player behavior. The lack of observable evidence makes it difficult to collect reliably labeled data, which is essential for training effective anti-cheat systems. Furthermore, cheaters often adapt their behavior by limiting or disguising their cheat usage, which further complicates detection and detector development. To address these challenges, we propose a simulation framework for controlled modeling of ESP cheaters, non-cheaters, and trajectory-based detectors. We model cheaters and non-cheaters as reinforcement learning agents with different levels of observability, while detectors classify their behavioral trajectories. Next, we formulate the interaction between the cheater and the detector as an adversarial game, allowing both players to co-adapt over time. To reflect realistic cheater strategies, we introduce a structured cheater model that dynamically switches between cheating and non-cheating behaviors based on detection risk. Experiments demonstrate that our framework successfully simulates adaptive cheater behaviors that strategically balance reward optimization and detection evasion. This work provides a controllable and extensible platform for studying adaptive cheating behaviors and developing effective cheat detectors.
