Table of Contents
Fetching ...

ProHunter: A Comprehensive APT Hunting System Based on Whole-System Provenance

Xuebo Qiu, Mingqi Lv, Yimei Zhang, Tiantian Zhu, Tieming Chen

Abstract

Advanced Persistent Threats (APTs) remain difficult to detect due to their stealthy nature and long-term persistence. To tackle this challenge, provenance-based threat hunting has gained traction as a proactive defense mechanism. This technique models audit logs as a whole-system provenance graph and searches for subgraphs that match APT patterns recorded in Cyber Threat Intelligence (CTI) reports. However, several limitations persist: 1) significant memory and time overhead due to the extremely large provenance graphs; 2) imprecise segmentation of APT activities from provenance graphs due to their intricate entanglement with benign operations; and 3) poor alignment of attack representations between CTI-derived query graphs and provenance graphs due to their substantial semantic gaps. To address these limitations, this paper presents ProHunter, an efficient and accurate provenance-based APT hunting system with a platform-independent design. To minimize system overhead, ProHunter creates a compact data structure that efficiently stores long-term provenance graphs using semantic abstraction and bit-level hierarchical encoding strategies. To segment APT behaviors, a heuristic-driven threat graph sampling algorithm is designed, which can extract precise attack patterns from provenance graphs. Furthermore, to bridge the semantic gaps between CTI-derived graphs and provenance graphs, ProHunter proposes adaptive graph representation and feature enhancement methods, enabling the extraction of consistent attack semantics at both localized and globalized levels.Extensive evaluations on real-world APT campaigns from DARPA TC E3, E5 and OpTC datasets demonstrate that ProHunter outperforms state-of-the-art threat hunting systems in terms of efficiency and accuracy. Our code is available at https://github.com/xueboQiu/ProHunter.

ProHunter: A Comprehensive APT Hunting System Based on Whole-System Provenance

Abstract

Advanced Persistent Threats (APTs) remain difficult to detect due to their stealthy nature and long-term persistence. To tackle this challenge, provenance-based threat hunting has gained traction as a proactive defense mechanism. This technique models audit logs as a whole-system provenance graph and searches for subgraphs that match APT patterns recorded in Cyber Threat Intelligence (CTI) reports. However, several limitations persist: 1) significant memory and time overhead due to the extremely large provenance graphs; 2) imprecise segmentation of APT activities from provenance graphs due to their intricate entanglement with benign operations; and 3) poor alignment of attack representations between CTI-derived query graphs and provenance graphs due to their substantial semantic gaps. To address these limitations, this paper presents ProHunter, an efficient and accurate provenance-based APT hunting system with a platform-independent design. To minimize system overhead, ProHunter creates a compact data structure that efficiently stores long-term provenance graphs using semantic abstraction and bit-level hierarchical encoding strategies. To segment APT behaviors, a heuristic-driven threat graph sampling algorithm is designed, which can extract precise attack patterns from provenance graphs. Furthermore, to bridge the semantic gaps between CTI-derived graphs and provenance graphs, ProHunter proposes adaptive graph representation and feature enhancement methods, enabling the extraction of consistent attack semantics at both localized and globalized levels.Extensive evaluations on real-world APT campaigns from DARPA TC E3, E5 and OpTC datasets demonstrate that ProHunter outperforms state-of-the-art threat hunting systems in terms of efficiency and accuracy. Our code is available at https://github.com/xueboQiu/ProHunter.
Paper Structure (63 sections, 6 equations, 13 figures, 15 tables, 1 algorithm)

This paper contains 63 sections, 6 equations, 13 figures, 15 tables, 1 algorithm.

Figures (13)

  • Figure 1: Overview of the three-stage threat hunting pipeline. The attack campaign shown is "Malicious Escalation" from Day 3 of the OpTC dataset. Stage 1 samples a threat graph (b) from a provenance graph (a). Stage 2 extracts a query graph (d) from a CTI report (c). Stage 3 performs attack semantic matching between query and threat graphs to identify threats. Node shapes represent entity types: rectangles (processes), ellipses (files), diamonds (netflows), and pentagons (registries). Colored nodes in (b) and (d) represent matched attack entities, while highlighted entities in (c) corresponds to extracted nodes in (d).
  • Figure 2: System architecture of ProHunter.
  • Figure 3: The storage architecture of PPG.
  • Figure 4: Illustration of adaptive graph representation.
  • Figure 5: Threat graph sampled from E3-Trace: the attack 'Firefox Backdoor with Drakon In-Memory'.
  • ...and 8 more figures