ProHunter: A Comprehensive APT Hunting System Based on Whole-System Provenance

Xuebo Qiu; Mingqi Lv; Yimei Zhang; Tiantian Zhu; Tieming Chen

ProHunter: A Comprehensive APT Hunting System Based on Whole-System Provenance

Xuebo Qiu, Mingqi Lv, Yimei Zhang, Tiantian Zhu, Tieming Chen

Abstract

Advanced Persistent Threats (APTs) remain difficult to detect due to their stealthy nature and long-term persistence. To tackle this challenge, provenance-based threat hunting has gained traction as a proactive defense mechanism. This technique models audit logs as a whole-system provenance graph and searches for subgraphs that match APT patterns recorded in Cyber Threat Intelligence (CTI) reports. However, several limitations persist: 1) significant memory and time overhead due to the extremely large provenance graphs; 2) imprecise segmentation of APT activities from provenance graphs due to their intricate entanglement with benign operations; and 3) poor alignment of attack representations between CTI-derived query graphs and provenance graphs due to their substantial semantic gaps. To address these limitations, this paper presents ProHunter, an efficient and accurate provenance-based APT hunting system with a platform-independent design. To minimize system overhead, ProHunter creates a compact data structure that efficiently stores long-term provenance graphs using semantic abstraction and bit-level hierarchical encoding strategies. To segment APT behaviors, a heuristic-driven threat graph sampling algorithm is designed, which can extract precise attack patterns from provenance graphs. Furthermore, to bridge the semantic gaps between CTI-derived graphs and provenance graphs, ProHunter proposes adaptive graph representation and feature enhancement methods, enabling the extraction of consistent attack semantics at both localized and globalized levels.Extensive evaluations on real-world APT campaigns from DARPA TC E3, E5 and OpTC datasets demonstrate that ProHunter outperforms state-of-the-art threat hunting systems in terms of efficiency and accuracy. Our code is available at https://github.com/xueboQiu/ProHunter.

ProHunter: A Comprehensive APT Hunting System Based on Whole-System Provenance

Abstract

Paper Structure (63 sections, 6 equations, 13 figures, 15 tables, 1 algorithm)

This paper contains 63 sections, 6 equations, 13 figures, 15 tables, 1 algorithm.

Introduction
Background
Definitions
Provenance Graph
Query Graph
Points of Interest (POIs)
Problem and Goal
Previous Research Limitation
Efficiency
Adaptability
Accuracy
Scalability
Threat Model
System Design
Overview
...and 48 more sections

Figures (13)

Figure 1: Overview of the three-stage threat hunting pipeline. The attack campaign shown is "Malicious Escalation" from Day 3 of the OpTC dataset. Stage 1 samples a threat graph (b) from a provenance graph (a). Stage 2 extracts a query graph (d) from a CTI report (c). Stage 3 performs attack semantic matching between query and threat graphs to identify threats. Node shapes represent entity types: rectangles (processes), ellipses (files), diamonds (netflows), and pentagons (registries). Colored nodes in (b) and (d) represent matched attack entities, while highlighted entities in (c) corresponds to extracted nodes in (d).
Figure 2: System architecture of ProHunter.
Figure 3: The storage architecture of PPG.
Figure 4: Illustration of adaptive graph representation.
Figure 5: Threat graph sampled from E3-Trace: the attack 'Firefox Backdoor with Drakon In-Memory'.
...and 8 more figures

ProHunter: A Comprehensive APT Hunting System Based on Whole-System Provenance

Abstract

ProHunter: A Comprehensive APT Hunting System Based on Whole-System Provenance

Authors

Abstract

Table of Contents

Figures (13)