Table of Contents
Fetching ...

SubstratumGraphEnv: Reinforcement Learning Environment (RLE) for Modeling System Attack Paths

Bahirah Adewunmi, Edward Raff, Sanjay Purushotham

TL;DR

A Reinforcement Learning (RL) environment generation framework that simulates the sequence of processes executed on a Windows operating system, enabling dynamic modeling of malicious processes on a system.

Abstract

Automating network security analysis, particularly the identification of potential attack paths, presents significant challenges. Due in part to the sequential, interconnected, and evolutionary nature of system events which most artificial intelligence (AI) techniques struggle to model effectively. This paper proposes a Reinforcement Learning (RL) environment generation framework that simulates the sequence of processes executed on a Windows operating system, enabling dynamic modeling of malicious processes on a system. This methodology models operating system state and transitions using a graph representation. This graph is derived from open-source System Monitor (Sysmon) logs. To address the variety in system event types, fields, and log formats, a mechanism was developed to capture and model parent-child processes from Sysmon logs. A Gymnasium environment (SubstratumGraphEnv) was constructed to establish the perceptible basis for an RL environment, and a customized PyTorch interface was also built (SubstratumBridge) to translate Gymnasium graphs into Deep Reinforcement Learning (DRL) observations and discrete actions. Graph Convolutional Networks (GCNs) concretize the graph's local and global state, which feed the distinct policy and critic heads of an Advantage Actor-Critic (A2C) model. This work's central contribution lies in the design of a novel deep graphical RL environment that automates translation of sequential user and system events, furnishing crucial context for cybersecurity analysis. This work provides a foundation for future research into shaping training parameters and advanced reward shaping, while also offering insight into which system events attributes are critical to training autonomous RL agents.

SubstratumGraphEnv: Reinforcement Learning Environment (RLE) for Modeling System Attack Paths

TL;DR

A Reinforcement Learning (RL) environment generation framework that simulates the sequence of processes executed on a Windows operating system, enabling dynamic modeling of malicious processes on a system.

Abstract

Automating network security analysis, particularly the identification of potential attack paths, presents significant challenges. Due in part to the sequential, interconnected, and evolutionary nature of system events which most artificial intelligence (AI) techniques struggle to model effectively. This paper proposes a Reinforcement Learning (RL) environment generation framework that simulates the sequence of processes executed on a Windows operating system, enabling dynamic modeling of malicious processes on a system. This methodology models operating system state and transitions using a graph representation. This graph is derived from open-source System Monitor (Sysmon) logs. To address the variety in system event types, fields, and log formats, a mechanism was developed to capture and model parent-child processes from Sysmon logs. A Gymnasium environment (SubstratumGraphEnv) was constructed to establish the perceptible basis for an RL environment, and a customized PyTorch interface was also built (SubstratumBridge) to translate Gymnasium graphs into Deep Reinforcement Learning (DRL) observations and discrete actions. Graph Convolutional Networks (GCNs) concretize the graph's local and global state, which feed the distinct policy and critic heads of an Advantage Actor-Critic (A2C) model. This work's central contribution lies in the design of a novel deep graphical RL environment that automates translation of sequential user and system events, furnishing crucial context for cybersecurity analysis. This work provides a foundation for future research into shaping training parameters and advanced reward shaping, while also offering insight into which system events attributes are critical to training autonomous RL agents.
Paper Structure (14 sections, 6 equations, 4 figures, 3 tables, 1 algorithm)

This paper contains 14 sections, 6 equations, 4 figures, 3 tables, 1 algorithm.

Figures (4)

  • Figure 1: This partial network snapshot of the BRAWL dataset reveals the sparse and disassortative nature of the system's process graph. The node labels—ranging from standard Windows processes like svchost to strings from the "Fresh Prince" theme—are raw identifiers extracted directly from the dataset's Sysmon logs. While frequent processes like powershell have numerous connections (thick edges), the low clustering demonstrates that an RL agent cannot rely on dense local patterns, but must instead learn to navigate these non-intuitive, sparse pathways to identify the full attack chain.
  • Figure 2: This snapshot of Cerberus Traces reveals a dense, complex process graph with a few processes, like pythonw, acting as central hubs for activity. Despite this high connectivity, the network's low clustering shows that critical attack sequences are not confined to local, dense event patterns. Therefore, the RL environment must guide the agent through a large, distributed state space to discover significant sequential connections.
  • Figure 3: High-level pipeline from raw Sysmon relations to TorchRL-ready graph observations.
  • Figure 4: Value loss versus training steps on the Cerberus DRL environment. Decreasing value loss indicates that critic is learning to predict value accurately