Table of Contents
Fetching ...

Adversarial Reinforcement Learning Framework for ESP Cheater Simulation

Inkyu Park, Jeong-Gwan Lee, Taehwan Kwon, Juheon Choi, Seungku Kim, Junsu Kim, Kimin Lee

TL;DR

The paper tackles the challenge of detecting ESP cheating in games, where ground-truth labels are scarce and cheaters adapt to detectors. It introduces an ESP cheater simulation framework that co-evolves a cheater, a non-cheater, and a trajectory-based detector within a minimax reinforcement learning setup, including a structured cheater model that can switch between cheating and non-cheating based on detection risk. Through Gridworld and Blackjack experiments, the study demonstrates that adversarial training yields adaptive cheating behaviors that balance reward optimization and evasion, while detectors improve or degrade accordingly, revealing vulnerabilities of static detectors. The framework offers a controllable platform to study adaptive cheating and to develop more robust cheat detectors, with extensions to more complex games and multi-agent settings envisioned for future work.

Abstract

Extra-Sensory Perception (ESP) cheats, which reveal hidden in-game information such as enemy locations, are difficult to detect because their effects are not directly observable in player behavior. The lack of observable evidence makes it difficult to collect reliably labeled data, which is essential for training effective anti-cheat systems. Furthermore, cheaters often adapt their behavior by limiting or disguising their cheat usage, which further complicates detection and detector development. To address these challenges, we propose a simulation framework for controlled modeling of ESP cheaters, non-cheaters, and trajectory-based detectors. We model cheaters and non-cheaters as reinforcement learning agents with different levels of observability, while detectors classify their behavioral trajectories. Next, we formulate the interaction between the cheater and the detector as an adversarial game, allowing both players to co-adapt over time. To reflect realistic cheater strategies, we introduce a structured cheater model that dynamically switches between cheating and non-cheating behaviors based on detection risk. Experiments demonstrate that our framework successfully simulates adaptive cheater behaviors that strategically balance reward optimization and detection evasion. This work provides a controllable and extensible platform for studying adaptive cheating behaviors and developing effective cheat detectors.

Adversarial Reinforcement Learning Framework for ESP Cheater Simulation

TL;DR

The paper tackles the challenge of detecting ESP cheating in games, where ground-truth labels are scarce and cheaters adapt to detectors. It introduces an ESP cheater simulation framework that co-evolves a cheater, a non-cheater, and a trajectory-based detector within a minimax reinforcement learning setup, including a structured cheater model that can switch between cheating and non-cheating based on detection risk. Through Gridworld and Blackjack experiments, the study demonstrates that adversarial training yields adaptive cheating behaviors that balance reward optimization and evasion, while detectors improve or degrade accordingly, revealing vulnerabilities of static detectors. The framework offers a controllable platform to study adaptive cheating and to develop more robust cheat detectors, with extensions to more complex games and multi-agent settings envisioned for future work.

Abstract

Extra-Sensory Perception (ESP) cheats, which reveal hidden in-game information such as enemy locations, are difficult to detect because their effects are not directly observable in player behavior. The lack of observable evidence makes it difficult to collect reliably labeled data, which is essential for training effective anti-cheat systems. Furthermore, cheaters often adapt their behavior by limiting or disguising their cheat usage, which further complicates detection and detector development. To address these challenges, we propose a simulation framework for controlled modeling of ESP cheaters, non-cheaters, and trajectory-based detectors. We model cheaters and non-cheaters as reinforcement learning agents with different levels of observability, while detectors classify their behavioral trajectories. Next, we formulate the interaction between the cheater and the detector as an adversarial game, allowing both players to co-adapt over time. To reflect realistic cheater strategies, we introduce a structured cheater model that dynamically switches between cheating and non-cheating behaviors based on detection risk. Experiments demonstrate that our framework successfully simulates adaptive cheater behaviors that strategically balance reward optimization and detection evasion. This work provides a controllable and extensible platform for studying adaptive cheating behaviors and developing effective cheat detectors.

Paper Structure

This paper contains 28 sections, 14 equations, 12 figures, 4 tables, 1 algorithm.

Figures (12)

  • Figure 1: Overview of ESP cheater simulation framework. The cheat detector discriminates the trajectory whether it was generated by a non-cheater or a cheater. After detector making its decision, the cheater updates its policy based on the detection result in order to evade detection. Simultaneously, the cheat detector updates itself to improve its classification accuracy.
  • Figure 2: Actor-critic model architecture of the cheater. The model consists of three components: the non-cheater, the pure cheater, and the gating network. Only the gating network is trainable. The gating network produces an interpolation weight $\omega$ used to combine the two policies, as well as the residual value $V_d = V_c - V_c^{(p)}$ corresponding to the expected penalty by the detector.
  • Figure 3: (a) Visualization of the Gridworld environment. The figures show walls in gray, the agent as a red triangle, items as green circles, and lava as orange regions with black wave patterns. Only the $3\times3$ region in front of the agent is visible. (b) Visualization of the Blackjack environment. Cards with the parenthesis are invisible to the non-cheater.
  • Figure 4: Performance metrics as functions of the adversarial coefficient $\lambda$. We plot the experimental results of two different settings: updating both the cheater and the detector (Joint) and updating only the cheater with the fixed detector (Cheater-only). As $\lambda$ increases, the cheater becomes harder to detect (lower AP and AUROC). The average reward decreases gradually, showing that the cheater sacrifices efficiency to avoid detection. Average trajectory length increases as the cheater takes longer and less efficient choices to appear less suspicious.
  • Figure 5: Reward changes over detectability. We can interpret figures from two different perspectives. (1) At an equivalent level of detectability across detectors, the cheater can get less reward with the adversarially trained detector compared to the fixed detector. (2) At an equivalent level of cheater reward, the adversarially trained detector achieves higher detectability than the fixed detector.
  • ...and 7 more figures