Table of Contents
Fetching ...

Deep Reinforcement Learning for Autonomous Cyber Defence: A Survey

Gregory Palmer, Chris Parry, Daniel J. B. Harrold, Chris Willis

TL;DR

A survey of the relevant DRL literature and conceptualize an idealised ACD-DRL agent, as well as an overview of state-of-the-art approaches for scaling DRL to domains that confront learners with the curse of dimensionality.

Abstract

The rapid increase in the number of cyber-attacks in recent years raises the need for principled methods for defending networks against malicious actors. Deep reinforcement learning (DRL) has emerged as a promising approach for mitigating these attacks. However, while DRL has shown much potential for cyber defence, numerous challenges must be overcome before DRL can be applied to the autonomous cyber defence (ACD) problem at scale. Principled methods are required for environments that confront learners with very high-dimensional state spaces, large multi-discrete action spaces, and adversarial learning. Recent works have reported success in solving these problems individually. There have also been impressive engineering efforts towards solving all three for real-time strategy games. However, applying DRL to the full ACD problem remains an open challenge. Here, we survey the relevant DRL literature and conceptualize an idealised ACD-DRL agent. We provide: i.) A summary of the domain properties that define the ACD problem; ii.) A comprehensive comparison of current ACD environments used for benchmarking DRL approaches; iii.) An overview of state-of-the-art approaches for scaling DRL to domains that confront learners with the curse of dimensionality, and; iv.) A survey and critique of current methods for limiting the exploitability of agents within adversarial settings from the perspective of ACD. We conclude with open research questions that we hope will motivate future directions for researchers and practitioners working on ACD.

Deep Reinforcement Learning for Autonomous Cyber Defence: A Survey

TL;DR

A survey of the relevant DRL literature and conceptualize an idealised ACD-DRL agent, as well as an overview of state-of-the-art approaches for scaling DRL to domains that confront learners with the curse of dimensionality.

Abstract

The rapid increase in the number of cyber-attacks in recent years raises the need for principled methods for defending networks against malicious actors. Deep reinforcement learning (DRL) has emerged as a promising approach for mitigating these attacks. However, while DRL has shown much potential for cyber defence, numerous challenges must be overcome before DRL can be applied to the autonomous cyber defence (ACD) problem at scale. Principled methods are required for environments that confront learners with very high-dimensional state spaces, large multi-discrete action spaces, and adversarial learning. Recent works have reported success in solving these problems individually. There have also been impressive engineering efforts towards solving all three for real-time strategy games. However, applying DRL to the full ACD problem remains an open challenge. Here, we survey the relevant DRL literature and conceptualize an idealised ACD-DRL agent. We provide: i.) A summary of the domain properties that define the ACD problem; ii.) A comprehensive comparison of current ACD environments used for benchmarking DRL approaches; iii.) An overview of state-of-the-art approaches for scaling DRL to domains that confront learners with the curse of dimensionality, and; iv.) A survey and critique of current methods for limiting the exploitability of agents within adversarial settings from the perspective of ACD. We conclude with open research questions that we hope will motivate future directions for researchers and practitioners working on ACD.
Paper Structure (70 sections, 1 theorem, 9 equations, 14 figures, 4 tables, 1 algorithm)

This paper contains 70 sections, 1 theorem, 9 equations, 14 figures, 4 tables, 1 algorithm.

Key Result

Theorem 1

In a finite zero-sum game:

Figures (14)

  • Figure 1: Three challenges that an idealised DRL-ACD agent must conquer.
  • Figure 2: An overview of the problem formulations discussed in this survey. Within these formulations we have the following variables: states $s$, rewards $r$, actions $a$, and observations $o$. For the Parameterized Action MDP we differentiate between discrete actions $a$ and continuous actions $u$.
  • Figure 3: An overview of multi-agent reinforcement learning training schemes.
  • Figure 4: Depiction of an Abstract MDP, that includes a mapping $\phi : \mathcal{S} \rightarrow \mathcal{S}_{\phi}$ from the full state $s$ to an abstract state $s_\phi$.
  • Figure 5: HOMER (Adapted from pmlr-v119-misra20a).
  • ...and 9 more figures

Theorems & Definitions (7)

  • Definition 5.1: Bisimulation.
  • Definition 7.1: Nash Equilibrium
  • Theorem 1: Minmax Theorem
  • Definition 7.2: $\epsilon$-Nash Equilibrium
  • Definition 7.3: Mixture of Policies
  • Definition 7.4: Approximate Best Response
  • Definition 7.5: Resource Bounded Nash Equilibrium