A Critical Review of Common Log Data Sets Used for Evaluation of Sequence-based Anomaly Detection Techniques
Max Landauer, Florian Skopik, Markus Wurzenberger
TL;DR
This paper critically evaluates six public log datasets (HDFS,BGL,Thunderbird,OpenStack,Hadoop,ADFA) to determine their suitability for evaluating sequence-based anomaly detection. It finds that most anomalies do not manifest as changes in sequential patterns; many datasets contain signals such as new event types or constant inter-event timings that simple detectors can exploit, sometimes inflating performance of complex models. By implementing a small, diverse set of baselines (new events, length, ECVC, N-grams, edit distance, and event timing) and conducting multi-run semi-supervised experiments, the authors demonstrate that simple detectors often achieve competitive results, and that several datasets (OpenStack, Hadoop) are poor for sequence-focused evaluation. The work advocates for designing new, sequence-focused benchmarks, ensuring reproducibility, and incorporating event-parameter signals, to provide a more accurate assessment of anomaly detection methods in log data. The findings underscore the practical impact of dataset design on evaluating and comparing sequence-based anomaly detectors in real-world systems.
Abstract
Log data store event execution patterns that correspond to underlying workflows of systems or applications. While most logs are informative, log data also include artifacts that indicate failures or incidents. Accordingly, log data are often used to evaluate anomaly detection techniques that aim to automatically disclose unexpected or otherwise relevant system behavior patterns. Recently, detection approaches leveraging deep learning have increasingly focused on anomalies that manifest as changes of sequential patterns within otherwise normal event traces. Several publicly available data sets, such as HDFS, BGL, Thunderbird, OpenStack, and Hadoop, have since become standards for evaluating these anomaly detection techniques, however, the appropriateness of these data sets has not been closely investigated in the past. In this paper we therefore analyze six publicly available log data sets with focus on the manifestations of anomalies and simple techniques for their detection. Our findings suggest that most anomalies are not directly related to sequential manifestations and that advanced detection techniques are not required to achieve high detection rates on these data sets.
