Table of Contents
Fetching ...

A Framework for Adversarial Analysis of Decision Support Systems Prior to Deployment

Brett Bissey, Kyle Gatesman, Walker Dimon, Mohammad Alam, Luis Robaina, Joseph Weissman

TL;DR

This work addresses the vulnerability of DRL-enabled decision-support systems to adversarial observation perturbations before deployment. It introduces a structured methodology to collect attack data, design realistic perturbations, and measure the impact on end-of-episode environment properties within CyberStrike, including a formal property model and various impact metrics. A key contribution is the combination of visualization, property-impact ranking across observation indices and time steps, and cross-algorithm transferability analysis under ADR and curriculum learning. The findings demonstrate that optimally timed, targeted perturbations can meaningfully shift outcomes and that transferability varies by algorithm and target, highlighting the need for robust adversarial evaluation and defense strategies in high-stakes decision-making systems.

Abstract

This paper introduces a comprehensive framework designed to analyze and secure decision-support systems trained with Deep Reinforcement Learning (DRL), prior to deployment, by providing insights into learned behavior patterns and vulnerabilities discovered through simulation. The introduced framework aids in the development of precisely timed and targeted observation perturbations, enabling researchers to assess adversarial attack outcomes within a strategic decision-making context. We validate our framework, visualize agent behavior, and evaluate adversarial outcomes within the context of a custom-built strategic game, CyberStrike. Utilizing the proposed framework, we introduce a method for systematically discovering and ranking the impact of attacks on various observation indices and time-steps, and we conduct experiments to evaluate the transferability of adversarial attacks across agent architectures and DRL training algorithms. The findings underscore the critical need for robust adversarial defense mechanisms to protect decision-making policies in high-stakes environments.

A Framework for Adversarial Analysis of Decision Support Systems Prior to Deployment

TL;DR

This work addresses the vulnerability of DRL-enabled decision-support systems to adversarial observation perturbations before deployment. It introduces a structured methodology to collect attack data, design realistic perturbations, and measure the impact on end-of-episode environment properties within CyberStrike, including a formal property model and various impact metrics. A key contribution is the combination of visualization, property-impact ranking across observation indices and time steps, and cross-algorithm transferability analysis under ADR and curriculum learning. The findings demonstrate that optimally timed, targeted perturbations can meaningfully shift outcomes and that transferability varies by algorithm and target, highlighting the need for robust adversarial evaluation and defense strategies in high-stakes decision-making systems.

Abstract

This paper introduces a comprehensive framework designed to analyze and secure decision-support systems trained with Deep Reinforcement Learning (DRL), prior to deployment, by providing insights into learned behavior patterns and vulnerabilities discovered through simulation. The introduced framework aids in the development of precisely timed and targeted observation perturbations, enabling researchers to assess adversarial attack outcomes within a strategic decision-making context. We validate our framework, visualize agent behavior, and evaluate adversarial outcomes within the context of a custom-built strategic game, CyberStrike. Utilizing the proposed framework, we introduce a method for systematically discovering and ranking the impact of attacks on various observation indices and time-steps, and we conduct experiments to evaluate the transferability of adversarial attacks across agent architectures and DRL training algorithms. The findings underscore the critical need for robust adversarial defense mechanisms to protect decision-making policies in high-stakes environments.

Paper Structure

This paper contains 30 sections, 5 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: RL interaction loop with an attack injected at time step $t$. This time step ends with the environment dynamics using the agent's action $a_t$ and the true state $s_t$ to compute the next state $s_{t+1}$ and the reward $r_{t+1}$. Time step $t + 1$ may or may not have an attack.
  • Figure 2: Example set of attacked episode simulations stemming from an unattacked episode with $4$ actions (top line). In this scenario, the attack algorithm ran several attacks on state $s_1$ (at time step $1$), and two of these attacks induced adversarial actions $a_1'$ and $a_1"$ that sufficiently differ from the original action $a_1$, meeting the criteria for simulating the rest of the episode. Taking the adversarial actions $a_1'$ and $a_1"$ from state $s_1$ will produce states $s_2'$ and $s_2"$, respectively, which may or may not differ from $s_2$.
  • Figure 3: A notional CyberStrike state. The blue agent controls nodes B0, B1, B2, B3. Blue chooses actions which control each blue node simultaneously, locating and disabling the target red node by peeling back the layers of the red defense network until the target node is undefended. In this example, the target node ($R0$) is defended by $R1$ and $R2$. $R2$ is defended by $R3$, which is defended by $R4$. $R1$ is defended by $R5$, $R6$, and $R7$. $R6$ is also defended by $R7$. Dashed lines denote a connection marked as unknown in the agent's observation, whereas solid lines represent a known connection. The agent begins with a fully unknown network, and must use its hackers to discover the network topology enough to reveal the target node's ($R0$'s) defenders and eventually hack into the target node.
  • Figure 4: This latent space representation maps a policy's CyberStrike observations from initial time-steps in the northwest region to the final time-steps in the southeast region, with an aggregation of various intermediate trajectories connecting the initial and final observations. Attacks within the denser, bluer northeast region of the space are unlikely to yield nonzero changes in final red counts, whereas attacks in the sparser and redder western regions are more likely to be successful (increase final red counts). The sparsity of activation embeddings in the western region of the latent space representation suggests the policy is less likely to have trained on observations in this region and thus is more vulnerable to adversarial attacks when acting within this region.
  • Figure 5: The Average Final Red Count delta post-attack is aggregated per observation index, across all time-steps. Eight out of the ten most impactful attacked observation indexes are observed defense network nodes, suggesting that an attacker's best chance of increasing the final red count is to perturb the DQN agent's perception of the network structure at various adjacency nodes.
  • ...and 2 more figures