Table of Contents
Fetching ...

Introducing a New Alert Data Set for Multi-Step Attack Analysis

Max Landauer, Florian Skopik, Markus Wurzenberger

TL;DR

This work tackles the lack of publicly available datasets for multi-step attack analysis by introducing a richly labeled alert dataset derived from the AIT-LDSv2, monitored with three detectors (Wazuh, Suricata, AMiner) across eight scenarios. It provides over 2.6 million alerts with 93 detector signatures and demonstrates practical utility through detector prioritization, alert aggregation, and attack-graph mining using open-source tools (AECID-Alert-Aggregation and SAGE). The findings show substantial reductions in alert review workload (average reduction rates up to $98.93\%$ for meta-alerts) while preserving attack-relevant signals, highlighting the dataset’s value for evaluating prioritization, filtering, meta-alert generation, and graph-based attack analysis. The dataset thus enables reproducible research and practical assessment of multi-step attack analytics in heterogeneous, multi-source environments, with potential extensions to attack-pattern recognition and attribution.

Abstract

Intrusion detection systems (IDS) reinforce cyber defense by autonomously monitoring various data sources for traces of attacks. However, IDSs are also infamous for frequently raising false positives and alerts that are difficult to interpret without context. This results in high workloads on security operators who need to manually verify all reported alerts, often leading to fatigue and incorrect decisions. To generate more meaningful alerts and alleviate these issues, the research domain focused on multi-step attack analysis proposes approaches for filtering, clustering, and correlating IDS alerts, as well as generation of attack graphs. Unfortunately, existing data sets are outdated, unreliable, narrowly focused, or only suitable for IDS evaluation. Since hardly any suitable benchmark data sets are publicly available, researchers often resort to private data sets that prevent reproducibility of evaluations. We therefore generate a new alert data set that we publish alongside this paper. The data set contains alerts from three distinct IDSs monitoring eight executions of a multi-step attack as well as simulations of normal user behavior. To illustrate the potential of our data set, we experiment with alert prioritization as well as two open-source tools for meta-alert generation and attack graph extraction.

Introducing a New Alert Data Set for Multi-Step Attack Analysis

TL;DR

This work tackles the lack of publicly available datasets for multi-step attack analysis by introducing a richly labeled alert dataset derived from the AIT-LDSv2, monitored with three detectors (Wazuh, Suricata, AMiner) across eight scenarios. It provides over 2.6 million alerts with 93 detector signatures and demonstrates practical utility through detector prioritization, alert aggregation, and attack-graph mining using open-source tools (AECID-Alert-Aggregation and SAGE). The findings show substantial reductions in alert review workload (average reduction rates up to for meta-alerts) while preserving attack-relevant signals, highlighting the dataset’s value for evaluating prioritization, filtering, meta-alert generation, and graph-based attack analysis. The dataset thus enables reproducible research and practical assessment of multi-step attack analytics in heterogeneous, multi-source environments, with potential extensions to attack-pattern recognition and attribution.

Abstract

Intrusion detection systems (IDS) reinforce cyber defense by autonomously monitoring various data sources for traces of attacks. However, IDSs are also infamous for frequently raising false positives and alerts that are difficult to interpret without context. This results in high workloads on security operators who need to manually verify all reported alerts, often leading to fatigue and incorrect decisions. To generate more meaningful alerts and alleviate these issues, the research domain focused on multi-step attack analysis proposes approaches for filtering, clustering, and correlating IDS alerts, as well as generation of attack graphs. Unfortunately, existing data sets are outdated, unreliable, narrowly focused, or only suitable for IDS evaluation. Since hardly any suitable benchmark data sets are publicly available, researchers often resort to private data sets that prevent reproducibility of evaluations. We therefore generate a new alert data set that we publish alongside this paper. The data set contains alerts from three distinct IDSs monitoring eight executions of a multi-step attack as well as simulations of normal user behavior. To illustrate the potential of our data set, we experiment with alert prioritization as well as two open-source tools for meta-alert generation and attack graph extraction.
Paper Structure (18 sections, 2 equations, 6 figures, 3 tables)

This paper contains 18 sections, 2 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Timelines of alert type occurrences. Shaded intervals indicate multi-step attack (A/blue) and data exfiltration (B/red).
  • Figure 2: Total number of alert occurrences by detection type in each of the eight scenarios.
  • Figure 3: Average alert occurrences per minute by detection type during attack phases and normal operation.
  • Figure 4: Timelines of alerts reported by highly ranked detectors during multi-step attack. Shaded intervals indicate network scans (A1/red), service scans (A2/cyan), WordPress scan (A3/yellow) Dirb scan (A4/blue), webshell upload and command execution (A5/green), password cracking (A6/light blue), reverse shell (A7/brown), and privilege escalation (A8/purple).
  • Figure 5: Meta-alerts and alert groups generated by the AECID-alert-aggregation framework for service scans (cyan), WordPress scan (yellow), Dirb scan (blue), webshell upload (green), password cracking (light blue), reverse shell (brown), privilege escalation (purple), service stop (dark green), and data exfiltration (red) attack phase.
  • ...and 1 more figures