Table of Contents
Fetching ...

Game-Theoretic Modeling of Stealthy Intrusion Defense against MDP-Based Attackers

Willie Kouam, Stefan Rass

TL;DR

This study model APT evolution as a strategic interaction between an attacker and a defender on an attack graph and investigates this interaction under three informational regimes, reflecting varying levels of attacker knowledge prior to action.

Abstract

The rapid expansion of Internet use has increased system exposure to cyber threats, with advanced persistent threats (APTs) being especially challenging due to their stealth, prolonged duration, and multi-stage attacks targeting high-value assets. In this study, we model APT evolution as a strategic interaction between an attacker and a defender on an attack graph. With limited information about the attacker's position and progress, the defender acts at random intervals by deploying intrusion detection sensors across the network. Once a compromise is detected, affected components are immediately secured through measures such as backdoor removal, patching, or system reconfiguration. Meanwhile, the attacker begins with reconnaissance and then proceeds through the network, exploiting vulnerabilities and installing backdoors to maintain persistent access and adaptive movement. Furthermore, the attacker may take several steps between consecutive defensive operations, resulting in an asymmetric temporal dynamic. The defender's goal is to reduce the likelihood that the attacker will gain access to a critical asset, whereas the attacker's purpose is to increase this likelihood. We investigate this interaction under three informational regimes, reflecting varying levels of attacker knowledge prior to action: (i) a Stackelberg scenario, in which the attacker has full knowledge of the defender's strategy and can optimize accordingly; (ii) a blind regime, where the attacker has no information and assumes uniform beliefs about defensive deployments; and (iii) a belief-based framework, where the attacker holds accurate probabilistic beliefs about the defender's actions. For each regime, we derive optimal defensive strategies by solving the corresponding optimization problems.

Game-Theoretic Modeling of Stealthy Intrusion Defense against MDP-Based Attackers

TL;DR

This study model APT evolution as a strategic interaction between an attacker and a defender on an attack graph and investigates this interaction under three informational regimes, reflecting varying levels of attacker knowledge prior to action.

Abstract

The rapid expansion of Internet use has increased system exposure to cyber threats, with advanced persistent threats (APTs) being especially challenging due to their stealth, prolonged duration, and multi-stage attacks targeting high-value assets. In this study, we model APT evolution as a strategic interaction between an attacker and a defender on an attack graph. With limited information about the attacker's position and progress, the defender acts at random intervals by deploying intrusion detection sensors across the network. Once a compromise is detected, affected components are immediately secured through measures such as backdoor removal, patching, or system reconfiguration. Meanwhile, the attacker begins with reconnaissance and then proceeds through the network, exploiting vulnerabilities and installing backdoors to maintain persistent access and adaptive movement. Furthermore, the attacker may take several steps between consecutive defensive operations, resulting in an asymmetric temporal dynamic. The defender's goal is to reduce the likelihood that the attacker will gain access to a critical asset, whereas the attacker's purpose is to increase this likelihood. We investigate this interaction under three informational regimes, reflecting varying levels of attacker knowledge prior to action: (i) a Stackelberg scenario, in which the attacker has full knowledge of the defender's strategy and can optimize accordingly; (ii) a blind regime, where the attacker has no information and assumes uniform beliefs about defensive deployments; and (iii) a belief-based framework, where the attacker holds accurate probabilistic beliefs about the defender's actions. For each regime, we derive optimal defensive strategies by solving the corresponding optimization problems.
Paper Structure (25 sections, 3 theorems, 43 equations, 8 figures, 3 tables)

This paper contains 25 sections, 3 theorems, 43 equations, 8 figures, 3 tables.

Key Result

Lemma 1

In our model, assuming that the attacker follows a policy $\pi$ generating the path $\pi:\; p_0 = v_0 \to p_1 \to p_2 \to \cdots \to p_{L_\pi} = v_t$, of length $L_{\pi}$ (number of edges from $v_0$ to $v_t$ when following $\pi$), the value of the MDP at the initial state $\boldsymbol{s_0} = (v, 0)

Figures (8)

  • Figure 1: Illustration of the temporal and informational structure of the game
  • Figure 2: Transition dynamics. Short paths success: 0.80; long path success: 0.35.
  • Figure 3: MARA network topology. Visualization of the defender’s optimal node allocation across the three analyzed frameworks. Red circles indicate the Stackelberg-optimal deployment, blue circles correspond to the blind strategy, and black circles represent the Dirichlet-based strategy.
  • Figure 4: Comparison of defensive strategies under varying attacker information: MARA case. The results show that the optimal defense consistently achieves the lowest attacker success probability by prioritizing nodes closest to the targets. Both shortest-path and random heuristics yield significantly higher breach probabilities, confirming the benefit of strategic allocation under uncertainty. However, when both players follow a Dirichlet belief structure, applying the Stackelberg policy directly may reduce defensive performance, illustrating the sensitivity of outcomes to modeling assumptions.
  • Figure 5: Comparison of defensive strategies under varying attacker information: MiR100 case. Due to low path diversity and the presence of dominant bottlenecks (notably node 15), all three game-theoretic frameworks converge to the identical performance curves. Topology dominates belief assumptions: once key points are protected, attacker behavior becomes effectively unique. This case illustrates that in highly constrained graphs, identifying structural bottlenecks is sufficient to recover the full benefit of strategic optimization.
  • ...and 3 more figures

Theorems & Definitions (9)

  • Definition 1: Attack MDP structure
  • Lemma 1
  • proof
  • Definition 2
  • Remark 1
  • Lemma 2
  • proof
  • Theorem 1
  • proof