Omniscient Attacker in Stochastic Security Games with Interdependent Nodes
Yuksel Arslantas, Ahmed Said Donmez, Ege Yuceel, Muhammed O. Sayin
TL;DR
This work analyzes the vulnerability of reinforcement-learning defenders in stochastic security games to an omniscient attacker who leverages knowledge of the defender’s learning dynamics. Using a linear influence network to model interdependent security assets and neuro-dynamic programming for approximate value iteration, it recasts the problem as an MDP from the attacker’s perspective and proves existence and convergence properties for the solution. Empirically, the omniscient attacker consistently outperforms a naïve defender, with stronger exploitation when the attacker has greater modeling capacity and exploration, underscoring a significant risk in current RL-based defenses. The study motivates development of robust, learning-resilient defense mechanisms for critical infrastructure.
Abstract
The adoption of reinforcement learning for critical infrastructure defense introduces a vulnerability where sophisticated attackers can strategically exploit the defense algorithm's learning dynamics. While prior work addresses this vulnerability in the context of repeated normal-form games, its extension to the stochastic games remains an open research gap. We close this gap by examining stochastic security games between an RL defender and an omniscient attacker, utilizing a tractable linear influence network model. To overcome the structural limitations of prior methods, we propose and apply neuro-dynamic programming. Our experimental results demonstrate that the omniscient attacker can significantly outperform a naive defender, highlighting the critical vulnerability introduced by the learning dynamics and the effectiveness of the proposed strategy.
