Table of Contents
Fetching ...

LogAction: Consistent Cross-system Anomaly Detection through Logs via Active Domain Adaptation

Chiming Duan, Minghua He, Pei Xiao, Tong Jia, Xin Zhang, Zhewei Zhong, Xiang Luo, Yan Niu, Lingzhe Zhang, Yifan Wu, Siyu Yu, Weijie Hong, Ying Li, Gang Huang

TL;DR

LogAction tackles cross-system log anomaly detection under labeling scarcity by uniting transfer learning with active learning through a three-phase pipeline: Log Parser, Encoding, and Active Domain Adaptation. It aligns source and target log distributions with contrastive encoding, then uses two sampling strategies—free energy-based and uncertainty-based—to select highly informative target logs for labeling and fine-tuning, yielding a robust target classifier with minimal labels. Empirical results on three public datasets show an average $F1$-score of $93.01\%$ using only $2\%$ of labels, outperforming state-of-the-art baselines by up to $26.28\%$. The approach demonstrates significant practical impact by enabling high-precision log-based anomaly detection in new systems with scarce labeled data.

Abstract

Log-based anomaly detection is a essential task for ensuring the reliability and performance of software systems. However, the performance of existing anomaly detection methods heavily relies on labeling, while labeling a large volume of logs is highly challenging. To address this issue, many approaches based on transfer learning and active learning have been proposed. Nevertheless, their effectiveness is hindered by issues such as the gap between source and target system data distributions and cold-start problems. In this paper, we propose LogAction, a novel log-based anomaly detection model based on active domain adaptation. LogAction integrates transfer learning and active learning techniques. On one hand, it uses labeled data from a mature system to train a base model, mitigating the cold-start issue in active learning. On the other hand, LogAction utilize free energy-based sampling and uncertainty-based sampling to select logs located at the distribution boundaries for manual labeling, thus addresses the data distribution gap in transfer learning with minimal human labeling efforts. Experimental results on six different combinations of datasets demonstrate that LogAction achieves an average 93.01% F1 score with only 2% of manual labels, outperforming some state-of-the-art methods by 26.28%. Website: https://logaction.github.io

LogAction: Consistent Cross-system Anomaly Detection through Logs via Active Domain Adaptation

TL;DR

LogAction tackles cross-system log anomaly detection under labeling scarcity by uniting transfer learning with active learning through a three-phase pipeline: Log Parser, Encoding, and Active Domain Adaptation. It aligns source and target log distributions with contrastive encoding, then uses two sampling strategies—free energy-based and uncertainty-based—to select highly informative target logs for labeling and fine-tuning, yielding a robust target classifier with minimal labels. Empirical results on three public datasets show an average -score of using only of labels, outperforming state-of-the-art baselines by up to . The approach demonstrates significant practical impact by enabling high-precision log-based anomaly detection in new systems with scarce labeled data.

Abstract

Log-based anomaly detection is a essential task for ensuring the reliability and performance of software systems. However, the performance of existing anomaly detection methods heavily relies on labeling, while labeling a large volume of logs is highly challenging. To address this issue, many approaches based on transfer learning and active learning have been proposed. Nevertheless, their effectiveness is hindered by issues such as the gap between source and target system data distributions and cold-start problems. In this paper, we propose LogAction, a novel log-based anomaly detection model based on active domain adaptation. LogAction integrates transfer learning and active learning techniques. On one hand, it uses labeled data from a mature system to train a base model, mitigating the cold-start issue in active learning. On the other hand, LogAction utilize free energy-based sampling and uncertainty-based sampling to select logs located at the distribution boundaries for manual labeling, thus addresses the data distribution gap in transfer learning with minimal human labeling efforts. Experimental results on six different combinations of datasets demonstrate that LogAction achieves an average 93.01% F1 score with only 2% of manual labels, outperforming some state-of-the-art methods by 26.28%. Website: https://logaction.github.io

Paper Structure

This paper contains 22 sections, 9 equations, 6 figures, 4 tables, 1 algorithm.

Figures (6)

  • Figure 1: Two log sequences from different systems (BGL and ThunderBird). Although they express the same error - file or directory does not exist, their formats show distinct differences.
  • Figure 2: The overview of LogAction. LogAction includes three main phases: Log Parser, Encoding and Active Domain Adaptation. Firstly, the raw system logs are labeled and parsed into log event sequences. Secondly, LogAction encodes the log sequences from both the source system and the target system, mapping them to similar distribution. Finally, LogAction is initially trained using labeled log vectors from the source system. Subsequently, it is fine-tuned with a very limited amount of target system logs via active learning to adapt to cross-system anomaly detection.
  • Figure 3: The Encoding Phase. In the initial stage, log sequences from the source system and the target system originate from distinct distributions (located in different hyperplanes). Employing contrastive learning, the objective is to map the distributions of normal log sequences from the source system and normal as well as anomalous log sequences from the target system to similar distributions (situated within the same hyperplane).
  • Figure 4: The overview of encoder.
  • Figure 5: [id=replace img]Human labeling effortsHuman labeling amount and LogAction F1-score curve
  • ...and 1 more figures