Table of Contents
Fetching ...

AlertBERT: A noise-robust alert grouping framework for simultaneous cyber attacks

Lukas Karner, Max Landauer, Markus Wurzenberger, Florian Skopik

TL;DR

AlertBERT addresses the challenge of grouping security alerts in large networks under high noise and concurrent attacks by marrying self-supervised alert embeddings from a masked-language-model with a density-based clustering approach guided by a time-aware metric. The framework comprises an Embedding-Phase to produce rich alert representations and a Grouping-Phase that uses a time-cosine distance to form robust alert groups, improving upon time-delta baselines. The authors validate AlertBERT on augmented versions of the AIT-ADS dataset (AIT-ADS-A), showing superior ROC-AUC performance across multiple scenarios and noise levels, with strong improvements in overlapping attack contexts. They also discuss limitations, such as data drift and the need for dataset realism, and outline future work to enhance tokenisation, training objectives, and dataset scale for real-world SOC deployment.

Abstract

Automated detection of cyber attacks is a critical capability to counteract the growing volume and sophistication of cyber attacks. However, the high numbers of security alerts issued by intrusion detection systems lead to alert fatigue among analysts working in security operations centres (SOC), which in turn causes slow reaction time and incorrect decision making. Alert grouping, which refers to clustering of security alerts according to their underlying causes, can significantly reduce the number of distinct items analysts have to consider. Unfortunately, conventional time-based alert grouping solutions are unsuitable for large scale computer networks characterised by high levels of false positive alerts and simultaneously occurring attacks. To address these limitations, we propose AlertBERT, a self-supervised framework designed to group alerts from isolated or concurrent attacks in noisy environments. Thereby, our open-source implementation of AlertBERT leverages masked-language-models and density-based clustering to support both real-time or forensic operation. To evaluate our framework, we further introduce a novel data augmentation method that enables flexible control over noise levels and simulates concurrent attack occurrences. Based on the data sets generated through this method, we demonstrate that AlertBERT consistently outperforms conventional time-based grouping techniques, achieving superior accuracy in identifying correct alert groups.

AlertBERT: A noise-robust alert grouping framework for simultaneous cyber attacks

TL;DR

AlertBERT addresses the challenge of grouping security alerts in large networks under high noise and concurrent attacks by marrying self-supervised alert embeddings from a masked-language-model with a density-based clustering approach guided by a time-aware metric. The framework comprises an Embedding-Phase to produce rich alert representations and a Grouping-Phase that uses a time-cosine distance to form robust alert groups, improving upon time-delta baselines. The authors validate AlertBERT on augmented versions of the AIT-ADS dataset (AIT-ADS-A), showing superior ROC-AUC performance across multiple scenarios and noise levels, with strong improvements in overlapping attack contexts. They also discuss limitations, such as data drift and the need for dataset realism, and outline future work to enhance tokenisation, training objectives, and dataset scale for real-world SOC deployment.

Abstract

Automated detection of cyber attacks is a critical capability to counteract the growing volume and sophistication of cyber attacks. However, the high numbers of security alerts issued by intrusion detection systems lead to alert fatigue among analysts working in security operations centres (SOC), which in turn causes slow reaction time and incorrect decision making. Alert grouping, which refers to clustering of security alerts according to their underlying causes, can significantly reduce the number of distinct items analysts have to consider. Unfortunately, conventional time-based alert grouping solutions are unsuitable for large scale computer networks characterised by high levels of false positive alerts and simultaneously occurring attacks. To address these limitations, we propose AlertBERT, a self-supervised framework designed to group alerts from isolated or concurrent attacks in noisy environments. Thereby, our open-source implementation of AlertBERT leverages masked-language-models and density-based clustering to support both real-time or forensic operation. To evaluate our framework, we further introduce a novel data augmentation method that enables flexible control over noise levels and simulates concurrent attack occurrences. Based on the data sets generated through this method, we demonstrate that AlertBERT consistently outperforms conventional time-based grouping techniques, achieving superior accuracy in identifying correct alert groups.
Paper Structure (26 sections, 2 equations, 6 figures, 6 tables, 2 algorithms)

This paper contains 26 sections, 2 equations, 6 figures, 6 tables, 2 algorithms.

Figures (6)

  • Figure 1: An illustration of the AlertBERT framework showing the components of the framework, the data flow among them, and the parameters to set in them. For each of the two phases of the framework, the components belonging to them and their internal data states are highlighted.
  • Figure 2: A simplified example of an alert in JSON-format produced by the AMiner aminer IDS as it is part of the AIT Alert Dataset aitads.
  • Figure 3: An illustration of a neighbourhood around an alert defined by the time-cosine-metric in the alert representation space.
  • Figure 4: An illustration of our data augmentation method for creating AIT-ADS-A.
  • Figure 5: ROC-curves of AlertBERT and time-delta on the simul-attacks configuration of AIT-ADS-A.
  • ...and 1 more figures