MLAD: A Unified Model for Multi-system Log Anomaly Detection
Runqiang Zang, Hongcheng Guo, Jian Yang, Jiaheng Liu, Zhoujun Li, Tieqiao Zheng, Xu Shi, Liangfan Zheng, Bo Zhang
TL;DR
MLAD tackles the challenge of scalable, cross-system log anomaly detection by unifying normal-log distributions across multiple systems and addressing the identical shortcut problem. It integrates Sentence-BERT based semantic embeddings, a sparse alpha-entmax self-attention mechanism, and an EM-guided Gaussian Mixture Model to produce a discriminative energy for anomaly scoring. Through unsupervised training on normal data and cross-system transfer experiments, MLAD demonstrates superior performance over a range of baselines on BGL, HDFS, and Thunderbird, and shows robustness in fused multi-system settings. This approach offers a scalable, transferable solution for real-world log anomaly detection across heterogeneous systems, with potential implications for proactive system maintenance and cross-domain anomaly detection.
Abstract
In spite of the rapid advancements in unsupervised log anomaly detection techniques, the current mainstream models still necessitate specific training for individual system datasets, resulting in costly procedures and limited scalability due to dataset size, thereby leading to performance bottlenecks. Furthermore, numerous models lack cognitive reasoning capabilities, posing challenges in direct transferability to similar systems for effective anomaly detection. Additionally, akin to reconstruction networks, these models often encounter the "identical shortcut" predicament, wherein the majority of system logs are classified as normal, erroneously predicting normal classes when confronted with rare anomaly logs due to reconstruction errors. To address the aforementioned issues, we propose MLAD, a novel anomaly detection model that incorporates semantic relational reasoning across multiple systems. Specifically, we employ Sentence-bert to capture the similarities between log sequences and convert them into highly-dimensional learnable semantic vectors. Subsequently, we revamp the formulas of the Attention layer to discern the significance of each keyword in the sequence and model the overall distribution of the multi-system dataset through appropriate vector space diffusion. Lastly, we employ a Gaussian mixture model to highlight the uncertainty of rare words pertaining to the "identical shortcut" problem, optimizing the vector space of the samples using the maximum expectation model. Experiments on three real-world datasets demonstrate the superiority of MLAD.
