Table of Contents
Fetching ...

Optimal Defender Strategies for CAGE-2 using Causal Modeling and Tree Search

Kim Hammar, Neil Dhir, Rolf Stadler

TL;DR

A formal (causal) model of CAGE-2 is presented together with a method that produces a provably optimal defender strategy, which is called Causal Partially Observable Monte-Carlo Planning (C-POMCP).

Abstract

The CAGE-2 challenge is considered a standard benchmark to compare methods for autonomous cyber defense. Current state-of-the-art methods evaluated against this benchmark are based on model-free (offline) reinforcement learning, which does not provide provably optimal defender strategies. We address this limitation and present a formal (causal) model of CAGE-2 together with a method that produces a provably optimal defender strategy, which we call Causal Partially Observable Monte-Carlo Planning (C-POMCP). It has two key properties. First, it incorporates the causal structure of the target system, i.e., the causal relationships among the system variables. This structure allows for a significant reduction of the search space of defender strategies. Second, it is an online method that updates the defender strategy at each time step via tree search. Evaluations against the CAGE-2 benchmark show that C-POMCP achieves state-of-the-art performance with respect to effectiveness and is two orders of magnitude more efficient in computing time than the closest competitor method.

Optimal Defender Strategies for CAGE-2 using Causal Modeling and Tree Search

TL;DR

A formal (causal) model of CAGE-2 is presented together with a method that produces a provably optimal defender strategy, which is called Causal Partially Observable Monte-Carlo Planning (C-POMCP).

Abstract

The CAGE-2 challenge is considered a standard benchmark to compare methods for autonomous cyber defense. Current state-of-the-art methods evaluated against this benchmark are based on model-free (offline) reinforcement learning, which does not provide provably optimal defender strategies. We address this limitation and present a formal (causal) model of CAGE-2 together with a method that produces a provably optimal defender strategy, which we call Causal Partially Observable Monte-Carlo Planning (C-POMCP). It has two key properties. First, it incorporates the causal structure of the target system, i.e., the causal relationships among the system variables. This structure allows for a significant reduction of the search space of defender strategies. Second, it is an online method that updates the defender strategy at each time step via tree search. Evaluations against the CAGE-2 benchmark show that C-POMCP achieves state-of-the-art performance with respect to effectiveness and is two orders of magnitude more efficient in computing time than the closest competitor method.
Paper Structure (31 sections, 12 theorems, 50 equations, 12 figures, 8 tables, 1 algorithm)

This paper contains 31 sections, 12 theorems, 50 equations, 12 figures, 8 tables, 1 algorithm.

Key Result

Theorem 1

Assuming $C_t,\mathcal{A},\mathcal{V},q_t,\beta_{I_{i,t},z},\psi_{z}$, are finite, and $\mathcal{T}$ is finite or $\gamma < 1$, then there exists an optimal deterministic defender strategy $\pi^{\star}_{\mathrm{D}}$. If $\mathcal{T}=\infty$, then there exists a $\pi^{\star}_{\mathrm{D}}$ that is sta

Figures (12)

  • Figure 1: The cage-2 scenario cage_challenge_2_announcement: a defender aims to protect a networked system against an Advanced Persistent Threat (apt) caused by an attacker while maintaining services for clients; the system configuration is listed in Appendix \ref{['appendix:infrastructure_configuration']}.
  • Figure 2: Related work on autonomous cyber defense; this paper addresses apt defense using causality, control theory, and reinforcement learning.
  • Figure 3: Causal graphs pearl2000causality; circles represent variables in an scm (\ref{['eq:scm_def']}); solid arrows represent causal relations, and dashed edges represent effects caused by latent variables; latent variables can either be represented with shaded circles or with bidirected dashed edges, i.e., the graphs in a) and b) represent the same causal structure.
  • Figure 4: Two causal graphs and the corresponding sets of pomiss (Def. \ref{['def:pomis']}); $J$ is the target variable, and all other variables are manipulative.
  • Figure 5: Transition diagram of the intrusion state $I_{i,t}$ (\ref{['eq:intrusion_state_fun']}); self-transitions are not shown; disks represent states; arrows represent state transitions; labels indicate conditions for state transition; the initial state is $I_{i,1}=\mathsf{U}$.
  • ...and 7 more figures

Theorems & Definitions (31)

  • Definition 1: Causal effect identifiability pearl2000causality
  • Definition 2: Control problem identifiability pearl2000causality
  • Definition 3: pomis, adapted from lee2019structural
  • Remark 1
  • Remark 2
  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • Remark 3
  • ...and 21 more