Cross-System Software Log-based Anomaly Detection Using Meta-Learning
Yuqing Wang, Mika V. Mäntylä, Jesse Nyyssölä, Ke Ping, Liqiang Wang
TL;DR
The paper tackles cross-system log-based anomaly detection in modern software systems where labeling costs are high and logs evolve over time. It proposes CroSysLog, which combines neural log-event representations based on subword-BERT embeddings with a MAML-based meta-learning framework, enabling rapid adaptation to new target systems using only a few labeled events. The method supports many-to-many transfer by training on multiple source systems and evaluating on multiple targets, achieving state-of-the-art or competitive F1 scores on four open datasets (BGL, Thunderbird, Liberty, Spirit) while being more efficient. The work demonstrates practical applicability for AIOps, showing efficient adaptation and reduced labeling requirements, with future work extending to more diverse systems and real-world deployments.
Abstract
Modern software systems produce vast amounts of logs, serving as an essential resource for anomaly detection. Artificial Intelligence for IT Operations (AIOps) tools have been developed to automate the process of log-based anomaly detection for software systems. Three practical challenges are widely recognized in this field: high data labeling costs, evolving logs in dynamic systems, and adaptability across different systems. In this paper, we propose CroSysLog, an AIOps tool for log-event level anomaly detection, specifically designed in response to these challenges. Following prior approaches, CroSysLog uses a neural representation approach to gain a nuanced understanding of logs and generate representations for individual log events accordingly. CroSysLog can be trained on source systems with sufficient labeled logs from open datasets to achieve robustness, and then efficiently adapt to target systems with a few labeled log events for effective anomaly detection. We evaluate CroSysLog using open datasets of four large-scale distributed supercomputing systems: BGL, Thunderbird, Liberty, and Spirit. We used random log splits, maintaining the chronological order of consecutive log events, from these systems to train and evaluate CroSysLog. These splits were widely distributed across a one/two-year span of each system's log collection duration, thereby capturing the evolving nature of the logs in each system. Our results show that, after training CroSysLog on Liberty and BGL as source systems, CroSysLog can efficiently adapt to target systems Thunderbird and Spirit using a few labeled log events from each target system, effectively performing anomaly detection for these target systems. The results demonstrate that CroSysLog is a practical, scalable, and adaptable tool for log-event level anomaly detection in operational and maintenance contexts of software systems.
