Table of Contents
Fetching ...

Human-AI Collaboration in Cloud Security: Cognitive Hierarchy-Driven Deep Reinforcement Learning

Zahra Aref, Sheng Wei, Narayan B. Mandayam

TL;DR

A Cognitive Hierarchy Theory-driven Deep Q-Network (CHT-DQN) framework that models interactive decision-making between SOC analysts and AI-driven APT bots is proposed, highlighting the potential of integrating cognitive models into deep reinforcement learning to improve real-time SOC decision-making for cloud security.

Abstract

Given the complexity of multi-tenant cloud environments and the growing need for real-time threat mitigation, Security Operations Centers (SOCs) must adopt AI-driven adaptive defense mechanisms to counter Advanced Persistent Threats (APTs). However, SOC analysts face challenges in handling adaptive adversarial tactics, requiring intelligent decision-support frameworks. We propose a Cognitive Hierarchy Theory-driven Deep Q-Network (CHT-DQN) framework that models interactive decision-making between SOC analysts and AI-driven APT bots. The SOC analyst (defender) operates at cognitive level-1, anticipating attacker strategies, while the APT bot (attacker) follows a level-0 policy. By incorporating CHT into DQN, our framework enhances adaptive SOC defense using Attack Graph (AG)-based reinforcement learning. Simulation experiments across varying AG complexities show that CHT-DQN consistently achieves higher data protection and lower action discrepancies compared to standard DQN. A theoretical lower bound further confirms its superiority as AG complexity increases. A human-in-the-loop (HITL) evaluation on Amazon Mechanical Turk (MTurk) reveals that SOC analysts using CHT-DQN-derived transition probabilities align more closely with adaptive attackers, leading to better defense outcomes. Moreover, human behavior aligns with Prospect Theory (PT) and Cumulative Prospect Theory (CPT): participants are less likely to reselect failed actions and more likely to persist with successful ones. This asymmetry reflects amplified loss sensitivity and biased probability weighting -- underestimating gains after failure and overestimating continued success. Our findings highlight the potential of integrating cognitive models into deep reinforcement learning to improve real-time SOC decision-making for cloud security.

Human-AI Collaboration in Cloud Security: Cognitive Hierarchy-Driven Deep Reinforcement Learning

TL;DR

A Cognitive Hierarchy Theory-driven Deep Q-Network (CHT-DQN) framework that models interactive decision-making between SOC analysts and AI-driven APT bots is proposed, highlighting the potential of integrating cognitive models into deep reinforcement learning to improve real-time SOC decision-making for cloud security.

Abstract

Given the complexity of multi-tenant cloud environments and the growing need for real-time threat mitigation, Security Operations Centers (SOCs) must adopt AI-driven adaptive defense mechanisms to counter Advanced Persistent Threats (APTs). However, SOC analysts face challenges in handling adaptive adversarial tactics, requiring intelligent decision-support frameworks. We propose a Cognitive Hierarchy Theory-driven Deep Q-Network (CHT-DQN) framework that models interactive decision-making between SOC analysts and AI-driven APT bots. The SOC analyst (defender) operates at cognitive level-1, anticipating attacker strategies, while the APT bot (attacker) follows a level-0 policy. By incorporating CHT into DQN, our framework enhances adaptive SOC defense using Attack Graph (AG)-based reinforcement learning. Simulation experiments across varying AG complexities show that CHT-DQN consistently achieves higher data protection and lower action discrepancies compared to standard DQN. A theoretical lower bound further confirms its superiority as AG complexity increases. A human-in-the-loop (HITL) evaluation on Amazon Mechanical Turk (MTurk) reveals that SOC analysts using CHT-DQN-derived transition probabilities align more closely with adaptive attackers, leading to better defense outcomes. Moreover, human behavior aligns with Prospect Theory (PT) and Cumulative Prospect Theory (CPT): participants are less likely to reselect failed actions and more likely to persist with successful ones. This asymmetry reflects amplified loss sensitivity and biased probability weighting -- underestimating gains after failure and overestimating continued success. Our findings highlight the potential of integrating cognitive models into deep reinforcement learning to improve real-time SOC decision-making for cloud security.

Paper Structure

This paper contains 40 sections, 1 theorem, 19 equations, 8 figures, 1 table.

Key Result

Theorem 1

As the number of attack graph nodes $N \to \infty$, the Q-value function for the SOC analyst under CHT-DQN is lower bounded by the Q-value function under DQN, assuming a stationary and known attack strategy.

Figures (8)

  • Figure 1: Illustrations of real-time SOC decision-making using attack graphs (AGs).
  • Figure 2: Overview of the proposed CHT-DQN framework for cloud security. The framework models interactions between the SOC analyst (defender) and the APT attacker using AGs and deep reinforcement learning. The SOC analyst module includes experience replay storage ($e_{\mathcal{D}}$), minibatch sampling, and a target neural network (NN) for updating policy $\pi_{\mathcal{D}}(a_{\mathcal{D}}|s)$. The SOC analyst computes $Q_{\mathcal{D}}(s, a_{\mathcal{D}}, \boldsymbol{\theta}_{\mathcal{D}})$ and applies a softmax function on $Q_{\mathcal{A}}$ to estimate the attacker's action probabilities $\mathbb{P}(a_{\mathcal{A}} | s, k = 0)$. Similarly, the attacker module stores experiences ($e_{\mathcal{A}}$), samples minibatches, and optimizes its policy $\pi_{\mathcal{A}}(a_{\mathcal{A}} | s)$. The forward connections highlight the SOC analyst’s predictive modeling of the attacker’s actions, leveraging the Cognitive Hierarchy Theory (CHT). Both modules use fully connected (FC) layers to process state-action pairs and optimize loss functions $L_{\mathcal{D}}(\boldsymbol{\theta}_{\mathcal{D}})$ and $L_{\mathcal{A}}(\boldsymbol{\theta}_{\mathcal{A}})$.
  • Figure 3: Human-Interactive Web-Based DRL Games on Amazon MTurk. The SOC analyst (defender) interacts with a DQN-based attacker under two different strategic information settings.
  • Figure 4: Convergence Analysis: Running Average Rewards Over Time for SOC Analyst (Defender) and Attacker in Different Scenarios with a 6-Node AG. Timesteps 0 to 1000 represent training with epsilon-greedy, and timesteps 1000 to 2000 represent evaluation with pure exploitation. The results were averaged over 10 random seeds.
  • Figure 5: Performance Comparison of CHT-DQN and DQN SOC Analyst Strategies. The figures illustrate the advantages of using CHT-DQN over standard DQN in terms of data protection and action alignment.
  • ...and 3 more figures

Theorems & Definitions (1)

  • Theorem 1