LogAction: Consistent Cross-system Anomaly Detection through Logs via Active Domain Adaptation
Chiming Duan, Minghua He, Pei Xiao, Tong Jia, Xin Zhang, Zhewei Zhong, Xiang Luo, Yan Niu, Lingzhe Zhang, Yifan Wu, Siyu Yu, Weijie Hong, Ying Li, Gang Huang
TL;DR
LogAction tackles cross-system log anomaly detection under labeling scarcity by uniting transfer learning with active learning through a three-phase pipeline: Log Parser, Encoding, and Active Domain Adaptation. It aligns source and target log distributions with contrastive encoding, then uses two sampling strategies—free energy-based and uncertainty-based—to select highly informative target logs for labeling and fine-tuning, yielding a robust target classifier with minimal labels. Empirical results on three public datasets show an average $F1$-score of $93.01\%$ using only $2\%$ of labels, outperforming state-of-the-art baselines by up to $26.28\%$. The approach demonstrates significant practical impact by enabling high-precision log-based anomaly detection in new systems with scarce labeled data.
Abstract
Log-based anomaly detection is a essential task for ensuring the reliability and performance of software systems. However, the performance of existing anomaly detection methods heavily relies on labeling, while labeling a large volume of logs is highly challenging. To address this issue, many approaches based on transfer learning and active learning have been proposed. Nevertheless, their effectiveness is hindered by issues such as the gap between source and target system data distributions and cold-start problems. In this paper, we propose LogAction, a novel log-based anomaly detection model based on active domain adaptation. LogAction integrates transfer learning and active learning techniques. On one hand, it uses labeled data from a mature system to train a base model, mitigating the cold-start issue in active learning. On the other hand, LogAction utilize free energy-based sampling and uncertainty-based sampling to select logs located at the distribution boundaries for manual labeling, thus addresses the data distribution gap in transfer learning with minimal human labeling efforts. Experimental results on six different combinations of datasets demonstrate that LogAction achieves an average 93.01% F1 score with only 2% of manual labels, outperforming some state-of-the-art methods by 26.28%. Website: https://logaction.github.io
