Table of Contents
Fetching ...

FusionLog: Cross-System Log-based Anomaly Detection via Fusion of General and Proprietary Knowledge

Xinlong Zhao, Tong Jia, Minghua He, Xixuan Yang, Ying Li

TL;DR

This work tackles zero-label cross-system log-based anomaly detection, where a target system provides no labeled data. It introduces FusionLog, a two-branch framework that first uses a training-free semantic router to partition unlabeled target logs into general and proprietary streams, then applies a small system-agnostic model to general logs while progressively distilling proprietary knowledge via iterative LLM–SM collaboration and RAG-based prompting. The key contributions include the general/proprietary log conceptualization, a semantic routing-based processing pipeline, and a multi-round distillation fusion method that integrates pseudo-labels from an LLM with a lightweight model. Empirical results on three public datasets (HDFS, BGL, OpenStack) show FusionLog achieving over 90% F1 under fully zero-label conditions and outperforming state-of-the-art cross-system methods, enabling practical zero-label deployment for new web systems.

Abstract

Log-based anomaly detection is critical for ensuring the stability and reliability of web systems. One of the key problems in this task is the lack of sufficient labeled logs, which limits the rapid deployment in new systems. Existing works usually leverage large-scale labeled logs from a mature web system and a small amount of labeled logs from a new system, using transfer learning to extract and generalize general knowledge across both domains. However, these methods focus solely on the transfer of general knowledge and neglect the disparity and potential mismatch between such knowledge and the proprietary knowledge of target system, thus constraining performance. To address this limitation, we propose FusionLog, a novel zero-label cross-system log-based anomaly detection method that effectively achieves the fusion of general and proprietary knowledge, enabling cross-system generalization without any labeled target logs. Specifically, we first design a training-free router based on semantic similarity that dynamically partitions unlabeled target logs into 'general logs' and 'proprietary logs.' For general logs, FusionLog employs a small model based on system-agnostic representation meta-learning for direct training and inference, inheriting the general anomaly patterns shared between the source and target systems. For proprietary logs, we iteratively generate pseudo-labels and fine-tune the small model using multi-round collaborative knowledge distillation and fusion based on large language model (LLM) and small model (SM) to enhance its capability to recognize anomaly patterns specific to the target system. Experimental results on three public log datasets from different systems show that FusionLog achieves over 90% F1-score under a fully zero-label setting, significantly outperforming state-of-the-art cross-system log-based anomaly detection methods.

FusionLog: Cross-System Log-based Anomaly Detection via Fusion of General and Proprietary Knowledge

TL;DR

This work tackles zero-label cross-system log-based anomaly detection, where a target system provides no labeled data. It introduces FusionLog, a two-branch framework that first uses a training-free semantic router to partition unlabeled target logs into general and proprietary streams, then applies a small system-agnostic model to general logs while progressively distilling proprietary knowledge via iterative LLM–SM collaboration and RAG-based prompting. The key contributions include the general/proprietary log conceptualization, a semantic routing-based processing pipeline, and a multi-round distillation fusion method that integrates pseudo-labels from an LLM with a lightweight model. Empirical results on three public datasets (HDFS, BGL, OpenStack) show FusionLog achieving over 90% F1 under fully zero-label conditions and outperforming state-of-the-art cross-system methods, enabling practical zero-label deployment for new web systems.

Abstract

Log-based anomaly detection is critical for ensuring the stability and reliability of web systems. One of the key problems in this task is the lack of sufficient labeled logs, which limits the rapid deployment in new systems. Existing works usually leverage large-scale labeled logs from a mature web system and a small amount of labeled logs from a new system, using transfer learning to extract and generalize general knowledge across both domains. However, these methods focus solely on the transfer of general knowledge and neglect the disparity and potential mismatch between such knowledge and the proprietary knowledge of target system, thus constraining performance. To address this limitation, we propose FusionLog, a novel zero-label cross-system log-based anomaly detection method that effectively achieves the fusion of general and proprietary knowledge, enabling cross-system generalization without any labeled target logs. Specifically, we first design a training-free router based on semantic similarity that dynamically partitions unlabeled target logs into 'general logs' and 'proprietary logs.' For general logs, FusionLog employs a small model based on system-agnostic representation meta-learning for direct training and inference, inheriting the general anomaly patterns shared between the source and target systems. For proprietary logs, we iteratively generate pseudo-labels and fine-tune the small model using multi-round collaborative knowledge distillation and fusion based on large language model (LLM) and small model (SM) to enhance its capability to recognize anomaly patterns specific to the target system. Experimental results on three public log datasets from different systems show that FusionLog achieves over 90% F1-score under a fully zero-label setting, significantly outperforming state-of-the-art cross-system log-based anomaly detection methods.

Paper Structure

This paper contains 21 sections, 2 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: General and Proprietary Log Examples.
  • Figure 2: The proposed zero-label cross-system log-based anomaly detection pipeline for FusionLog.
  • Figure 3: Routing Threshold Changes and Effects.
  • Figure 4: Round Variation and Threshold Strategy Comparison.
  • Figure 5: FusionLog