Generality Is Not Enough: Zero-Label Cross-System Log-Based Anomaly Detection via Knowledge-Level Collaboration
Xinlong Zhao, Tong Jia, Minghua He, Ying Li
TL;DR
The paper tackles zero-label cross-system log anomaly detection, where target-system logs exhibit both general patterns and proprietary, system-specific patterns. It proposes GeneralLog, an LLM–small model collaboration with knowledge-level routing: a training-free semantic router partitions unlabeled target logs into general and proprietary streams; a system-agnostic representation meta-learning small model handles general logs; and an LLM augmented with a retrieval-augmented generation (RAG) knowledge base handles proprietary logs. The routing uses $x_k_{sim} = \min_i sim_i$ with $sim_i = \max_j \mathrm{cos}(v_i,u_j)$ and a threshold $\tau$ to enable zero-label operation. On HDFS, BGL, and OpenStack datasets, GeneralLog achieves over 90% F1, significantly outperforming baselines and reducing labeling requirements. This knowledge-level collaboration enables practical, cost-efficient cross-system anomaly detection with strong generalization.
Abstract
Log-based anomaly detection is crucial for ensuring software system stability. However, the scarcity of labeled logs limits rapid deployment to new systems. Cross-system transfer has become an important research direction. State-of-the-art approaches perform well with a few labeled target logs, but limitations remain: small-model methods transfer general knowledge but overlook mismatches with the target system's proprietary knowledge; LLM-based methods can capture proprietary patterns but rely on a few positive examples and incur high inference cost. Existing LLM-small model collaborations route 'simple logs' to the small model and 'complex logs' to the LLM based on output uncertainty. In zero-label cross-system settings, supervised sample complexity is unavailable, and such routing does not consider knowledge separation. To address this, we propose GeneralLog, a novel LLM-small model collaborative method for zero-label cross-system log anomaly detection. GeneralLog dynamically routes unlabeled logs, letting the LLM handle 'proprietary logs' and the small model 'general logs,' enabling cross-system generalization without labeled target logs. Experiments on three public log datasets show that GeneralLog achieves over 90% F1-score under a fully zero-label setting, significantly outperforming existing methods.
