Table of Contents
Fetching ...

Leveraging Reinforcement Learning in Red Teaming for Advanced Ransomware Attack Simulations

Cheng Wang, Christopher Redino, Ryan Clark, Abdul Rahman, Sal Aguinaga, Sathvik Murli, Dhruv Nandakumar, Roland Rao, Lanxiao Huang, Daniel Radke, Edward Bowen

TL;DR

This work addresses the need for scalable, objective-driven red-teaming against ransomware by developing a reinforcement learning–based attack simulator in a digital twin of a real network. Using PPO and a detailed MDP formulation, the agent learns to discover and target high-value hosts while evading honeyfiles, demonstrated on a 152-host network. Key contributions include a first-network-scale RL ransomware model, analysis of risk-aversion effects on attack strategies, and insights into defense improvements via honeyfiles and retraining. The findings support proactive defense design and suggest future enhancements with multi-agent dynamics, dynamic honeypots, and MORL to jointly optimize attacker and defender objectives.

Abstract

Ransomware presents a significant and increasing threat to individuals and organizations by encrypting their systems and not releasing them until a large fee has been extracted. To bolster preparedness against potential attacks, organizations commonly conduct red teaming exercises, which involve simulated attacks to assess existing security measures. This paper proposes a novel approach utilizing reinforcement learning (RL) to simulate ransomware attacks. By training an RL agent in a simulated environment mirroring real-world networks, effective attack strategies can be learned quickly, significantly streamlining traditional, manual penetration testing processes. The attack pathways revealed by the RL agent can provide valuable insights to the defense team, helping them identify network weak points and develop more resilient defensive measures. Experimental results on a 152-host example network confirm the effectiveness of the proposed approach, demonstrating the RL agent's capability to discover and orchestrate attacks on high-value targets while evading honeyfiles (decoy files strategically placed to detect unauthorized access).

Leveraging Reinforcement Learning in Red Teaming for Advanced Ransomware Attack Simulations

TL;DR

This work addresses the need for scalable, objective-driven red-teaming against ransomware by developing a reinforcement learning–based attack simulator in a digital twin of a real network. Using PPO and a detailed MDP formulation, the agent learns to discover and target high-value hosts while evading honeyfiles, demonstrated on a 152-host network. Key contributions include a first-network-scale RL ransomware model, analysis of risk-aversion effects on attack strategies, and insights into defense improvements via honeyfiles and retraining. The findings support proactive defense design and suggest future enhancements with multi-agent dynamics, dynamic honeypots, and MORL to jointly optimize attacker and defender objectives.

Abstract

Ransomware presents a significant and increasing threat to individuals and organizations by encrypting their systems and not releasing them until a large fee has been extracted. To bolster preparedness against potential attacks, organizations commonly conduct red teaming exercises, which involve simulated attacks to assess existing security measures. This paper proposes a novel approach utilizing reinforcement learning (RL) to simulate ransomware attacks. By training an RL agent in a simulated environment mirroring real-world networks, effective attack strategies can be learned quickly, significantly streamlining traditional, manual penetration testing processes. The attack pathways revealed by the RL agent can provide valuable insights to the defense team, helping them identify network weak points and develop more resilient defensive measures. Experimental results on a 152-host example network confirm the effectiveness of the proposed approach, demonstrating the RL agent's capability to discover and orchestrate attacks on high-value targets while evading honeyfiles (decoy files strategically placed to detect unauthorized access).
Paper Structure (14 sections, 4 equations, 5 figures, 6 tables, 1 algorithm)

This paper contains 14 sections, 4 equations, 5 figures, 6 tables, 1 algorithm.

Figures (5)

  • Figure 1: Experiment network topology overview: Nodes depict subnets, with sizes proportional to the number of hosts within. Only subnet 15 (green node) is public.
  • Figure 2: Episode rewards under different risk aversion factors: $\rho=1$ (top), $\rho=5$ (middle), and $\rho=20$ (bottom).
  • Figure 3: Episode lengths under different risk aversion factors.
  • Figure 4: Number of encrypted hosts under different risk aversion factors.
  • Figure 5: Top 15 most frequently encrypted hosts from 100 attack paths under different risk-aversion profiles: $\rho=1$ (top), $\rho=5$ (middle), $\rho=20$ (bottom).