Weakly Supervised Anomaly Detection via Knowledge-Data Alignment
Haihong Zhao, Chenyi Zi, Yang Liu, Chen Zhang, Yan Zhou, Jia Li
TL;DR
KDAlign tackles weakly supervised anomaly detection by introducing knowledge-data alignment, which leverages rule knowledge expressed as propositional formulae and aligns it with data representations via Optimal Transport. A dual-encoder architecture maps data and rules into a common embedding space, and a differentiable OT loss integrates knowledge into WSAD training, providing robustness to noisy or incomplete rules. Empirical results across five real-world datasets show KDAlign consistently improves over baselines, with notable gains in challenging settings and resilience to noisy knowledge. This neural-symbolic integration enhances explainability and generalization in anomaly detection, offering a practical framework for incorporating expert rules into data-driven models.
Abstract
Anomaly detection (AD) plays a pivotal role in numerous web-based applications, including malware detection, anti-money laundering, device failure detection, and network fault analysis. Most methods, which rely on unsupervised learning, are hard to reach satisfactory detection accuracy due to the lack of labels. Weakly Supervised Anomaly Detection (WSAD) has been introduced with a limited number of labeled anomaly samples to enhance model performance. Nevertheless, it is still challenging for models, trained on an inadequate amount of labeled data, to generalize to unseen anomalies. In this paper, we introduce a novel framework Knowledge-Data Alignment (KDAlign) to integrate rule knowledge, typically summarized by human experts, to supplement the limited labeled data. Specifically, we transpose these rules into the knowledge space and subsequently recast the incorporation of knowledge as the alignment of knowledge and data. To facilitate this alignment, we employ the Optimal Transport (OT) technique. We then incorporate the OT distance as an additional loss term to the original objective function of WSAD methodologies. Comprehensive experimental results on five real-world datasets demonstrate that our proposed KDAlign framework markedly surpasses its state-of-the-art counterparts, achieving superior performance across various anomaly types.
