Table of Contents
Fetching ...

Generality Is Not Enough: Zero-Label Cross-System Log-Based Anomaly Detection via Knowledge-Level Collaboration

Xinlong Zhao, Tong Jia, Minghua He, Ying Li

TL;DR

The paper tackles zero-label cross-system log anomaly detection, where target-system logs exhibit both general patterns and proprietary, system-specific patterns. It proposes GeneralLog, an LLM–small model collaboration with knowledge-level routing: a training-free semantic router partitions unlabeled target logs into general and proprietary streams; a system-agnostic representation meta-learning small model handles general logs; and an LLM augmented with a retrieval-augmented generation (RAG) knowledge base handles proprietary logs. The routing uses $x_k_{sim} = \min_i sim_i$ with $sim_i = \max_j \mathrm{cos}(v_i,u_j)$ and a threshold $\tau$ to enable zero-label operation. On HDFS, BGL, and OpenStack datasets, GeneralLog achieves over 90% F1, significantly outperforming baselines and reducing labeling requirements. This knowledge-level collaboration enables practical, cost-efficient cross-system anomaly detection with strong generalization.

Abstract

Log-based anomaly detection is crucial for ensuring software system stability. However, the scarcity of labeled logs limits rapid deployment to new systems. Cross-system transfer has become an important research direction. State-of-the-art approaches perform well with a few labeled target logs, but limitations remain: small-model methods transfer general knowledge but overlook mismatches with the target system's proprietary knowledge; LLM-based methods can capture proprietary patterns but rely on a few positive examples and incur high inference cost. Existing LLM-small model collaborations route 'simple logs' to the small model and 'complex logs' to the LLM based on output uncertainty. In zero-label cross-system settings, supervised sample complexity is unavailable, and such routing does not consider knowledge separation. To address this, we propose GeneralLog, a novel LLM-small model collaborative method for zero-label cross-system log anomaly detection. GeneralLog dynamically routes unlabeled logs, letting the LLM handle 'proprietary logs' and the small model 'general logs,' enabling cross-system generalization without labeled target logs. Experiments on three public log datasets show that GeneralLog achieves over 90% F1-score under a fully zero-label setting, significantly outperforming existing methods.

Generality Is Not Enough: Zero-Label Cross-System Log-Based Anomaly Detection via Knowledge-Level Collaboration

TL;DR

The paper tackles zero-label cross-system log anomaly detection, where target-system logs exhibit both general patterns and proprietary, system-specific patterns. It proposes GeneralLog, an LLM–small model collaboration with knowledge-level routing: a training-free semantic router partitions unlabeled target logs into general and proprietary streams; a system-agnostic representation meta-learning small model handles general logs; and an LLM augmented with a retrieval-augmented generation (RAG) knowledge base handles proprietary logs. The routing uses with and a threshold to enable zero-label operation. On HDFS, BGL, and OpenStack datasets, GeneralLog achieves over 90% F1, significantly outperforming baselines and reducing labeling requirements. This knowledge-level collaboration enables practical, cost-efficient cross-system anomaly detection with strong generalization.

Abstract

Log-based anomaly detection is crucial for ensuring software system stability. However, the scarcity of labeled logs limits rapid deployment to new systems. Cross-system transfer has become an important research direction. State-of-the-art approaches perform well with a few labeled target logs, but limitations remain: small-model methods transfer general knowledge but overlook mismatches with the target system's proprietary knowledge; LLM-based methods can capture proprietary patterns but rely on a few positive examples and incur high inference cost. Existing LLM-small model collaborations route 'simple logs' to the small model and 'complex logs' to the LLM based on output uncertainty. In zero-label cross-system settings, supervised sample complexity is unavailable, and such routing does not consider knowledge separation. To address this, we propose GeneralLog, a novel LLM-small model collaborative method for zero-label cross-system log anomaly detection. GeneralLog dynamically routes unlabeled logs, letting the LLM handle 'proprietary logs' and the small model 'general logs,' enabling cross-system generalization without labeled target logs. Experiments on three public log datasets show that GeneralLog achieves over 90% F1-score under a fully zero-label setting, significantly outperforming existing methods.

Paper Structure

This paper contains 4 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: The proposed zero-label cross-system log-based anomaly detection pipeline for GeneralLog.
  • Figure 2: Routing Analysis.