Game-Theoretic Modeling of Stealthy Intrusion Defense against MDP-Based Attackers

Willie Kouam; Stefan Rass

Game-Theoretic Modeling of Stealthy Intrusion Defense against MDP-Based Attackers

Willie Kouam, Stefan Rass

TL;DR

This study model APT evolution as a strategic interaction between an attacker and a defender on an attack graph and investigates this interaction under three informational regimes, reflecting varying levels of attacker knowledge prior to action.

Abstract

The rapid expansion of Internet use has increased system exposure to cyber threats, with advanced persistent threats (APTs) being especially challenging due to their stealth, prolonged duration, and multi-stage attacks targeting high-value assets. In this study, we model APT evolution as a strategic interaction between an attacker and a defender on an attack graph. With limited information about the attacker's position and progress, the defender acts at random intervals by deploying intrusion detection sensors across the network. Once a compromise is detected, affected components are immediately secured through measures such as backdoor removal, patching, or system reconfiguration. Meanwhile, the attacker begins with reconnaissance and then proceeds through the network, exploiting vulnerabilities and installing backdoors to maintain persistent access and adaptive movement. Furthermore, the attacker may take several steps between consecutive defensive operations, resulting in an asymmetric temporal dynamic. The defender's goal is to reduce the likelihood that the attacker will gain access to a critical asset, whereas the attacker's purpose is to increase this likelihood. We investigate this interaction under three informational regimes, reflecting varying levels of attacker knowledge prior to action: (i) a Stackelberg scenario, in which the attacker has full knowledge of the defender's strategy and can optimize accordingly; (ii) a blind regime, where the attacker has no information and assumes uniform beliefs about defensive deployments; and (iii) a belief-based framework, where the attacker holds accurate probabilistic beliefs about the defender's actions. For each regime, we derive optimal defensive strategies by solving the corresponding optimization problems.

Game-Theoretic Modeling of Stealthy Intrusion Defense against MDP-Based Attackers

TL;DR

Abstract

Paper Structure (25 sections, 3 theorems, 43 equations, 8 figures, 3 tables)

This paper contains 25 sections, 3 theorems, 43 equations, 8 figures, 3 tables.

Introduction
Motivation and Model overview
Notation and symbols
Motivation
Model Overview
Network model and attack structure
Perfect Information (Stackelberg Game)
Problem formulation
Stackelberg Equilibrium
Blind Attacker with No Information
Uniform Belief Model
Blind attacker equilibrium computation
Belief-based defense under Dirichlet uncertainty
Dirichlet model for belief uncertainty
Dirichlet vs Stackelberg: advantages of a Dirichlet-robust approach
...and 10 more sections

Key Result

Lemma 1

In our model, assuming that the attacker follows a policy $\pi$ generating the path $\pi:\; p_0 = v_0 \to p_1 \to p_2 \to \cdots \to p_{L_\pi} = v_t$, of length $L_{\pi}$ (number of edges from $v_0$ to $v_t$ when following $\pi$), the value of the MDP at the initial state $\boldsymbol{s_0} = (v, 0)

Figures (8)

Figure 1: Illustration of the temporal and informational structure of the game
Figure 2: Transition dynamics. Short paths success: 0.80; long path success: 0.35.
Figure 3: MARA network topology. Visualization of the defender’s optimal node allocation across the three analyzed frameworks. Red circles indicate the Stackelberg-optimal deployment, blue circles correspond to the blind strategy, and black circles represent the Dirichlet-based strategy.
Figure 4: Comparison of defensive strategies under varying attacker information: MARA case. The results show that the optimal defense consistently achieves the lowest attacker success probability by prioritizing nodes closest to the targets. Both shortest-path and random heuristics yield significantly higher breach probabilities, confirming the benefit of strategic allocation under uncertainty. However, when both players follow a Dirichlet belief structure, applying the Stackelberg policy directly may reduce defensive performance, illustrating the sensitivity of outcomes to modeling assumptions.
Figure 5: Comparison of defensive strategies under varying attacker information: MiR100 case. Due to low path diversity and the presence of dominant bottlenecks (notably node 15), all three game-theoretic frameworks converge to the identical performance curves. Topology dominates belief assumptions: once key points are protected, attacker behavior becomes effectively unique. This case illustrates that in highly constrained graphs, identifying structural bottlenecks is sufficient to recover the full benefit of strategic optimization.
...and 3 more figures

Theorems & Definitions (9)

Definition 1: Attack MDP structure
Lemma 1
proof
Definition 2
Remark 1
Lemma 2
proof
Theorem 1
proof

Game-Theoretic Modeling of Stealthy Intrusion Defense against MDP-Based Attackers

TL;DR

Abstract

Game-Theoretic Modeling of Stealthy Intrusion Defense against MDP-Based Attackers

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (9)