Table of Contents
Fetching ...

GraphDART: Graph Distillation for Efficient Advanced Persistent Threat Detection

Saba Fathi Rabooki, Bowen Li, Falih Gozi Febrinanto, Ciyuan Peng, Elham Naghizade, Fengling Han, Feng Xia

TL;DR

GraphDART tackles the scalability bottleneck of provenance-graph-based APT detection in CPSSs by distilling large provenance graphs into compact representations using multiple distillation techniques. It combines provenance-graph construction, context-rich node features, and modular graph distillation with a FLASH-based GNN detector to maintain detection performance on significantly smaller graphs. The approach achieves comparable APT detection results to state-of-the-art baselines while reducing graph size by as much as 1-5% of the original and dramatically reducing training time, enabling scalable real-time analysis. This framework enables practical, efficient security monitoring in large, complex CPSS environments without sacrificing detection quality.

Abstract

Cyber-physical-social systems (CPSSs) have emerged in many applications over recent decades, requiring increased attention to security concerns. The rise of sophisticated threats like Advanced Persistent Threats (APTs) makes ensuring security in CPSSs particularly challenging. Provenance graph analysis has proven effective for tracing and detecting anomalies within systems, but the sheer size and complexity of these graphs hinder the efficiency of existing methods, especially those relying on graph neural networks (GNNs). To address these challenges, we present GraphDART, a modular framework designed to distill provenance graphs into compact yet informative representations, enabling scalable and effective anomaly detection. GraphDART can take advantage of diverse graph distillation techniques, including classic and modern graph distillation methods, to condense large provenance graphs while preserving essential structural and contextual information. This approach significantly reduces computational overhead, allowing GNNs to learn from distilled graphs efficiently and enhance detection performance. Extensive evaluations on benchmark datasets demonstrate the robustness of GraphDART in detecting malicious activities across cyber-physical-social systems. By optimizing computational efficiency, GraphDART provides a scalable and practical solution to safeguard interconnected environments against APTs.

GraphDART: Graph Distillation for Efficient Advanced Persistent Threat Detection

TL;DR

GraphDART tackles the scalability bottleneck of provenance-graph-based APT detection in CPSSs by distilling large provenance graphs into compact representations using multiple distillation techniques. It combines provenance-graph construction, context-rich node features, and modular graph distillation with a FLASH-based GNN detector to maintain detection performance on significantly smaller graphs. The approach achieves comparable APT detection results to state-of-the-art baselines while reducing graph size by as much as 1-5% of the original and dramatically reducing training time, enabling scalable real-time analysis. This framework enables practical, efficient security monitoring in large, complex CPSS environments without sacrificing detection quality.

Abstract

Cyber-physical-social systems (CPSSs) have emerged in many applications over recent decades, requiring increased attention to security concerns. The rise of sophisticated threats like Advanced Persistent Threats (APTs) makes ensuring security in CPSSs particularly challenging. Provenance graph analysis has proven effective for tracing and detecting anomalies within systems, but the sheer size and complexity of these graphs hinder the efficiency of existing methods, especially those relying on graph neural networks (GNNs). To address these challenges, we present GraphDART, a modular framework designed to distill provenance graphs into compact yet informative representations, enabling scalable and effective anomaly detection. GraphDART can take advantage of diverse graph distillation techniques, including classic and modern graph distillation methods, to condense large provenance graphs while preserving essential structural and contextual information. This approach significantly reduces computational overhead, allowing GNNs to learn from distilled graphs efficiently and enhance detection performance. Extensive evaluations on benchmark datasets demonstrate the robustness of GraphDART in detecting malicious activities across cyber-physical-social systems. By optimizing computational efficiency, GraphDART provides a scalable and practical solution to safeguard interconnected environments against APTs.
Paper Structure (23 sections, 5 equations, 4 figures, 5 tables, 1 algorithm)

This paper contains 23 sections, 5 equations, 4 figures, 5 tables, 1 algorithm.

Figures (4)

  • Figure 1: Example of system logs and corresponding provenance graph. Entities and events captured in the logs are represented as nodes and edges, respectively, in the provenance graph. Differentiating between benign and malicious nodes necessitates using a threat detection tool.
  • Figure 2: Framework of GraphDART. We first create provenance graph based on the input logs (Section \ref{['sec:prvgraph_con']}) and develop node features (Section \ref{['sec:node_feat_dev']}). Then we apply graph distillation to get the condensed graph (Section \ref{['sec:graph_distillation']}). Lastly, we train a GNN model to learn graph representation in training phase and detecting malicious nodes in inference phase (Section \ref{['sec:graph_rep_learn_node_cls']})
  • Figure 3: Node distribution (percentage) across classes in the DARPA datasets. Table \ref{['tbl:obsrv_datasets']} provides more details on the node classes.
  • Figure 4: APT detection performance across DARPA TC E3 datasets. GraphDART produces comparable results with FLASH while using small condensed graphs. Average results with $r \in \Set{0.006, 0.004, 0.002}$ are shown in the figure.