Table of Contents
Fetching ...

Omniscient Attacker in Stochastic Security Games with Interdependent Nodes

Yuksel Arslantas, Ahmed Said Donmez, Ege Yuceel, Muhammed O. Sayin

TL;DR

This work analyzes the vulnerability of reinforcement-learning defenders in stochastic security games to an omniscient attacker who leverages knowledge of the defender’s learning dynamics. Using a linear influence network to model interdependent security assets and neuro-dynamic programming for approximate value iteration, it recasts the problem as an MDP from the attacker’s perspective and proves existence and convergence properties for the solution. Empirically, the omniscient attacker consistently outperforms a naïve defender, with stronger exploitation when the attacker has greater modeling capacity and exploration, underscoring a significant risk in current RL-based defenses. The study motivates development of robust, learning-resilient defense mechanisms for critical infrastructure.

Abstract

The adoption of reinforcement learning for critical infrastructure defense introduces a vulnerability where sophisticated attackers can strategically exploit the defense algorithm's learning dynamics. While prior work addresses this vulnerability in the context of repeated normal-form games, its extension to the stochastic games remains an open research gap. We close this gap by examining stochastic security games between an RL defender and an omniscient attacker, utilizing a tractable linear influence network model. To overcome the structural limitations of prior methods, we propose and apply neuro-dynamic programming. Our experimental results demonstrate that the omniscient attacker can significantly outperform a naive defender, highlighting the critical vulnerability introduced by the learning dynamics and the effectiveness of the proposed strategy.

Omniscient Attacker in Stochastic Security Games with Interdependent Nodes

TL;DR

This work analyzes the vulnerability of reinforcement-learning defenders in stochastic security games to an omniscient attacker who leverages knowledge of the defender’s learning dynamics. Using a linear influence network to model interdependent security assets and neuro-dynamic programming for approximate value iteration, it recasts the problem as an MDP from the attacker’s perspective and proves existence and convergence properties for the solution. Empirically, the omniscient attacker consistently outperforms a naïve defender, with stronger exploitation when the attacker has greater modeling capacity and exploration, underscoring a significant risk in current RL-based defenses. The study motivates development of robust, learning-resilient defense mechanisms for critical infrastructure.

Abstract

The adoption of reinforcement learning for critical infrastructure defense introduces a vulnerability where sophisticated attackers can strategically exploit the defense algorithm's learning dynamics. While prior work addresses this vulnerability in the context of repeated normal-form games, its extension to the stochastic games remains an open research gap. We close this gap by examining stochastic security games between an RL defender and an omniscient attacker, utilizing a tractable linear influence network model. To overcome the structural limitations of prior methods, we propose and apply neuro-dynamic programming. Our experimental results demonstrate that the omniscient attacker can significantly outperform a naive defender, highlighting the critical vulnerability introduced by the learning dynamics and the effectiveness of the proposed strategy.

Paper Structure

This paper contains 8 sections, 17 equations, 4 figures.

Figures (4)

  • Figure 1: A linear influence network illustrating compromised and secured security assets. Green servers represent uncompromised assets, red servers represent compromised assets, and the blue server represents a defended asset. Shaded arrows depict interdependencies between assets: solid arrows indicate usable attack paths, and dashed arrows indicate unusable paths. The solid black arrow represents the attacker's direct attack.
  • Figure 2: The illustration depicts the game model of a LIN. The left panel displays a state diagram involving three security assets, where solid lines denote state transitions based on defender and attacker actions, and dashed lines indicate a reset to the original state. The right panel illustrates the network's evolution, where edges represent the influence and vulnerability one asset exerts on another. When an asset is compromised, it is removed from the network and its influence edges are recomputed; however, securing the asset restores its original connections. Therefore, the influence matrix $I$ is stochastic. On the other hand, the vulnerability matrix $V$ is not stochastic because even a compromised asset continues to impact the vulnerabilities of others.
  • Figure 3: Average discounted rewards of the attacker against different exploration levels.
  • Figure 4: Average discounted rewards of the attacker with different learning models.