Table of Contents
Fetching ...

Decoding the MITRE Engenuity ATT&CK Enterprise Evaluation: An Analysis of EDR Performance in Real-World Environments

Xiangmin Shen, Zhenyuan Li, Graham Burleigh, Lingzhi Wang, Yan Chen

TL;DR

This paper critiques MITRE Engenuity ATT&CK evaluations for lacking whole-graph insight and comprehensive interpretation, proposing a two-pronged methodology that combines whole-graph analysis with trend analysis. By constructing causal relationship attack graphs and applying connectivity and effectiveness analyses, the authors assess EDR attack reconstruction and protection capabilities across campaigns. They introduce metrics for detection coverage, confidence, and quality, and examine data sources and cross-platform compatibility to reveal trends and gaps in real-world EDR performance. The findings demonstrate improving detection coverage and contextual detail over time, while also highlighting persistent challenges in cross-host correlation, Linux protection, and living-off-the-land techniques. The work aims to bridge the gap between raw MITRE results and actionable insights for researchers, practitioners, and vendors, ultimately informing more robust EDR evaluations and security practices.

Abstract

Endpoint detection and response (EDR) systems have emerged as a critical component of enterprise security solutions, effectively combating endpoint threats like APT attacks with extended lifecycles. In light of the growing significance of endpoint detection and response (EDR) systems, many cybersecurity providers have developed their own proprietary EDR solutions. It's crucial for users to assess the capabilities of these detection engines to make informed decisions about which products to choose. This is especially urgent given the market's size, which is expected to reach around 3.7 billion dollars by 2023 and is still expanding. MITRE is a leading organization in cyber threat analysis. In 2018, MITRE started to conduct annual APT emulations that cover major EDR vendors worldwide. Indicators include telemetry, detection and blocking capability, etc. Nevertheless, the evaluation results published by MITRE don't contain any further interpretations or suggestions. In this paper, we thoroughly analyzed MITRE evaluation results to gain further insights into real-world EDR systems under test. Specifically, we designed a whole-graph analysis method, which utilizes additional control flow and data flow information to measure the performance of EDR systems. Besides, we analyze MITRE evaluation's results over multiple years from various aspects, including detection coverage, detection confidence, detection modifier, data source, compatibility, etc. Through the above studies, we have compiled a thorough summary of our findings and gained valuable insights from the evaluation results. We believe these summaries and insights can assist researchers, practitioners, and vendors in better understanding the strengths and limitations of mainstream EDR products.

Decoding the MITRE Engenuity ATT&CK Enterprise Evaluation: An Analysis of EDR Performance in Real-World Environments

TL;DR

This paper critiques MITRE Engenuity ATT&CK evaluations for lacking whole-graph insight and comprehensive interpretation, proposing a two-pronged methodology that combines whole-graph analysis with trend analysis. By constructing causal relationship attack graphs and applying connectivity and effectiveness analyses, the authors assess EDR attack reconstruction and protection capabilities across campaigns. They introduce metrics for detection coverage, confidence, and quality, and examine data sources and cross-platform compatibility to reveal trends and gaps in real-world EDR performance. The findings demonstrate improving detection coverage and contextual detail over time, while also highlighting persistent challenges in cross-host correlation, Linux protection, and living-off-the-land techniques. The work aims to bridge the gap between raw MITRE results and actionable insights for researchers, practitioners, and vendors, ultimately informing more robust EDR evaluations and security practices.

Abstract

Endpoint detection and response (EDR) systems have emerged as a critical component of enterprise security solutions, effectively combating endpoint threats like APT attacks with extended lifecycles. In light of the growing significance of endpoint detection and response (EDR) systems, many cybersecurity providers have developed their own proprietary EDR solutions. It's crucial for users to assess the capabilities of these detection engines to make informed decisions about which products to choose. This is especially urgent given the market's size, which is expected to reach around 3.7 billion dollars by 2023 and is still expanding. MITRE is a leading organization in cyber threat analysis. In 2018, MITRE started to conduct annual APT emulations that cover major EDR vendors worldwide. Indicators include telemetry, detection and blocking capability, etc. Nevertheless, the evaluation results published by MITRE don't contain any further interpretations or suggestions. In this paper, we thoroughly analyzed MITRE evaluation results to gain further insights into real-world EDR systems under test. Specifically, we designed a whole-graph analysis method, which utilizes additional control flow and data flow information to measure the performance of EDR systems. Besides, we analyze MITRE evaluation's results over multiple years from various aspects, including detection coverage, detection confidence, detection modifier, data source, compatibility, etc. Through the above studies, we have compiled a thorough summary of our findings and gained valuable insights from the evaluation results. We believe these summaries and insights can assist researchers, practitioners, and vendors in better understanding the strengths and limitations of mainstream EDR products.
Paper Structure (38 sections, 2 equations, 7 figures, 3 tables)

This paper contains 38 sections, 2 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: The attack graphs for scenario 1 in Wizard Spider+Sandworm (2022) evaluation. (a) The actual attack graph. The nodes are system entities like processes and files. The edges represent system events characterized by MITRE ATT&CK techniques IDs. The numbers denote the order of events. (b) The causal relationship attack graph. The nodes are attack steps characterized by MITRE ATT&CK techniques IDs. The edges represent causal relationships between attack steps. The nodes also contain the visibility of their corresponding techniques among all EDR systems and the number of EDR systems that blocked this attack before and at this step.
  • Figure 2: An overview of our analysis methodologies.
  • Figure 3: Technique perspective score distribution of each metric in different evaluations. The metrics are visibility (blue), analytic coverage (orange), confidence (green), and quality (red) from left to right, respectively.
  • Figure 4: Vendor perspective score distribution of each metric in different evaluations. The metrics are visibility (blue), analytic coverage (orange), confidence (green), and quality (red) from left to right, respectively.
  • Figure 5: The actual attack graph and the causal relationship attack graph for scenario 2 in Wizard Spider+Sandworm (2022) evaluation.
  • ...and 2 more figures