AlertBERT: A noise-robust alert grouping framework for simultaneous cyber attacks
Lukas Karner, Max Landauer, Markus Wurzenberger, Florian Skopik
TL;DR
AlertBERT addresses the challenge of grouping security alerts in large networks under high noise and concurrent attacks by marrying self-supervised alert embeddings from a masked-language-model with a density-based clustering approach guided by a time-aware metric. The framework comprises an Embedding-Phase to produce rich alert representations and a Grouping-Phase that uses a time-cosine distance to form robust alert groups, improving upon time-delta baselines. The authors validate AlertBERT on augmented versions of the AIT-ADS dataset (AIT-ADS-A), showing superior ROC-AUC performance across multiple scenarios and noise levels, with strong improvements in overlapping attack contexts. They also discuss limitations, such as data drift and the need for dataset realism, and outline future work to enhance tokenisation, training objectives, and dataset scale for real-world SOC deployment.
Abstract
Automated detection of cyber attacks is a critical capability to counteract the growing volume and sophistication of cyber attacks. However, the high numbers of security alerts issued by intrusion detection systems lead to alert fatigue among analysts working in security operations centres (SOC), which in turn causes slow reaction time and incorrect decision making. Alert grouping, which refers to clustering of security alerts according to their underlying causes, can significantly reduce the number of distinct items analysts have to consider. Unfortunately, conventional time-based alert grouping solutions are unsuitable for large scale computer networks characterised by high levels of false positive alerts and simultaneously occurring attacks. To address these limitations, we propose AlertBERT, a self-supervised framework designed to group alerts from isolated or concurrent attacks in noisy environments. Thereby, our open-source implementation of AlertBERT leverages masked-language-models and density-based clustering to support both real-time or forensic operation. To evaluate our framework, we further introduce a novel data augmentation method that enables flexible control over noise levels and simulates concurrent attack occurrences. Based on the data sets generated through this method, we demonstrate that AlertBERT consistently outperforms conventional time-based grouping techniques, achieving superior accuracy in identifying correct alert groups.
