Introducing a New Alert Data Set for Multi-Step Attack Analysis
Max Landauer, Florian Skopik, Markus Wurzenberger
TL;DR
This work tackles the lack of publicly available datasets for multi-step attack analysis by introducing a richly labeled alert dataset derived from the AIT-LDSv2, monitored with three detectors (Wazuh, Suricata, AMiner) across eight scenarios. It provides over 2.6 million alerts with 93 detector signatures and demonstrates practical utility through detector prioritization, alert aggregation, and attack-graph mining using open-source tools (AECID-Alert-Aggregation and SAGE). The findings show substantial reductions in alert review workload (average reduction rates up to $98.93\%$ for meta-alerts) while preserving attack-relevant signals, highlighting the dataset’s value for evaluating prioritization, filtering, meta-alert generation, and graph-based attack analysis. The dataset thus enables reproducible research and practical assessment of multi-step attack analytics in heterogeneous, multi-source environments, with potential extensions to attack-pattern recognition and attribution.
Abstract
Intrusion detection systems (IDS) reinforce cyber defense by autonomously monitoring various data sources for traces of attacks. However, IDSs are also infamous for frequently raising false positives and alerts that are difficult to interpret without context. This results in high workloads on security operators who need to manually verify all reported alerts, often leading to fatigue and incorrect decisions. To generate more meaningful alerts and alleviate these issues, the research domain focused on multi-step attack analysis proposes approaches for filtering, clustering, and correlating IDS alerts, as well as generation of attack graphs. Unfortunately, existing data sets are outdated, unreliable, narrowly focused, or only suitable for IDS evaluation. Since hardly any suitable benchmark data sets are publicly available, researchers often resort to private data sets that prevent reproducibility of evaluations. We therefore generate a new alert data set that we publish alongside this paper. The data set contains alerts from three distinct IDSs monitoring eight executions of a multi-step attack as well as simulations of normal user behavior. To illustrate the potential of our data set, we experiment with alert prioritization as well as two open-source tools for meta-alert generation and attack graph extraction.
