Table of Contents
Fetching ...

DEFENDCLI: {Command-Line} Driven Attack Provenance Examination

Peilun Wu, Nan Sun, Nour Moustafa, Youyang Qu, Ming Ding

TL;DR

DEFENDCLI targets limitations of provenance-based EDR by enabling command-line level analysis within attack provenance graphs, addressing interoperability, reliability, flexibility, and practicality. It introduces Attack-Clause Sketch with a refined, command-line–focused graph structure, hybrid node scoring, and interphase attack association via Leiden communities, combined with Attack-Evidence Awareness featuring Rule-Based Boosting, SimHash Ensemble, and InfoPath retrieval with GPT-powered reporting. The system uses a Retrieval-Augmented Generation (RAG) with Llama-2 to triage and explain alerts, prioritizing critical threats with contextual narrative. In evaluations on the DARPA E3 datasets and industrial real-time detection, DEFENDCLI achieves up to approximately $1.6\times$ precision improvements over state-of-the-art methods and up to $2.3\times$ improvements over leading research, while maintaining real-time performance through parallelization. These results demonstrate practical, scalable, high-precision attack provenance analysis that yields actionable insights for security teams.

Abstract

Endpoint Detection and Response (EDR) solutions embrace the method of attack provenance graph to discover unknown threats through system event correlation. However, this method still faces some unsolved problems in the fields of interoperability, reliability, flexibility, and practicability to deliver actionable results. Our research highlights the limitations of current solutions in detecting obfuscation, correlating attacks, identifying low-frequency events, and ensuring robust context awareness in relation to command-line activities. To address these challenges, we introduce DEFENDCLI, an innovative system leveraging provenance graphs that, for the first time, delves into command-line-level detection. By offering finer detection granularity, it addresses a gap in modern EDR systems that has been overlooked in previous research. Our solution improves the precision of the information representation by evaluating differentiation across three levels: unusual system process calls, suspicious command-line executions, and infrequent external network connections. This multi-level approach enables EDR systems to be more reliable in complex and dynamic environments. Our evaluation demonstrates that DEFENDCLI improves precision by approximately 1.6x compared to the state-of-the-art methods on the DARPA Engagement Series attack datasets. Extensive real-time industrial testing across various attack scenarios further validates its practical effectiveness. The results indicate that DEFENDCLI not only detects previously unknown attack instances, which are missed by other modern commercial solutions, but also achieves a 2.3x improvement in precision over the state-of-the-art research work.

DEFENDCLI: {Command-Line} Driven Attack Provenance Examination

TL;DR

DEFENDCLI targets limitations of provenance-based EDR by enabling command-line level analysis within attack provenance graphs, addressing interoperability, reliability, flexibility, and practicality. It introduces Attack-Clause Sketch with a refined, command-line–focused graph structure, hybrid node scoring, and interphase attack association via Leiden communities, combined with Attack-Evidence Awareness featuring Rule-Based Boosting, SimHash Ensemble, and InfoPath retrieval with GPT-powered reporting. The system uses a Retrieval-Augmented Generation (RAG) with Llama-2 to triage and explain alerts, prioritizing critical threats with contextual narrative. In evaluations on the DARPA E3 datasets and industrial real-time detection, DEFENDCLI achieves up to approximately precision improvements over state-of-the-art methods and up to improvements over leading research, while maintaining real-time performance through parallelization. These results demonstrate practical, scalable, high-precision attack provenance analysis that yields actionable insights for security teams.

Abstract

Endpoint Detection and Response (EDR) solutions embrace the method of attack provenance graph to discover unknown threats through system event correlation. However, this method still faces some unsolved problems in the fields of interoperability, reliability, flexibility, and practicability to deliver actionable results. Our research highlights the limitations of current solutions in detecting obfuscation, correlating attacks, identifying low-frequency events, and ensuring robust context awareness in relation to command-line activities. To address these challenges, we introduce DEFENDCLI, an innovative system leveraging provenance graphs that, for the first time, delves into command-line-level detection. By offering finer detection granularity, it addresses a gap in modern EDR systems that has been overlooked in previous research. Our solution improves the precision of the information representation by evaluating differentiation across three levels: unusual system process calls, suspicious command-line executions, and infrequent external network connections. This multi-level approach enables EDR systems to be more reliable in complex and dynamic environments. Our evaluation demonstrates that DEFENDCLI improves precision by approximately 1.6x compared to the state-of-the-art methods on the DARPA Engagement Series attack datasets. Extensive real-time industrial testing across various attack scenarios further validates its practical effectiveness. The results indicate that DEFENDCLI not only detects previously unknown attack instances, which are missed by other modern commercial solutions, but also achieves a 2.3x improvement in precision over the state-of-the-art research work.

Paper Structure

This paper contains 29 sections, 7 equations, 16 figures, 10 tables.

Figures (16)

  • Figure 1: Motivation Feedback: A provenance-based detector wang2020you deployed in an enterprise environment. Only 0.07% of the original alerts were truly actionable, providing accurate attack-related malicious activities, such as command-line executions and network connections, for a valid verification.
  • Figure 2: Omission: Even if a process chain is malicious, but it cannot be analyzed if we do not known what command-line are executed, of which effective threat verification is affected.
  • Figure 3: Attack-Clause Sketch and Attack-Evidence Awareness: The attack-clause sketch module includes (1) provenance construction for building behavior graphs and (2) command-line evaluation to refine weights using known attacks. The attack-evidence awareness module handles (3) anomaly detection for unknown threats and (4) reporting via RAG-LLM.
  • Figure 4: Comparison of (a) Information-Flow Focused Structure (heterogeneous nodes) and (b) Refined Command-Line Focused Structure (isomorphic process nodes with attributes). The refined structure reduces graph noise by encapsulating command and network data within process nodes.
  • Figure 5: Attack-chain causal inference: Node importance is calculated using PageRank (for rarity) and Shortest-Path Betweenness Centrality (for structural bridges). These scores are aggregated to determine initial edge weights.
  • ...and 11 more figures