Table of Contents
Fetching ...

SetAD: Semi-Supervised Anomaly Learning in Contextual Sets

Jianling Gao, Chongyang Tao, Xuelian Lin, Junfeng Liu, Shuai Ma

TL;DR

This work reframes semi-supervised anomaly detection as a set-level graded learning problem, arguing that anomalies are defined by their deviation within a group rather than in isolation. It introduces SetAD, an attention-based set encoder that learns to quantify set-level anomalousness via a graded regression objective and employs a context-calibrated scoring mechanism to stabilize point scores across diverse contexts. Empirical results on 10 real-world datasets show that SetAD outperforms state-of-the-art SSAD methods and that performance improves with larger contextual sets. The paper also provides theoretical analysis of the learned scoring function and the variance-reduction benefits of context normalization, highlighting robustness to contamination and data efficiency under limited labeled anomalies.

Abstract

Semi-supervised anomaly detection (AD) has shown great promise by effectively leveraging limited labeled data. However, existing methods are typically structured around scoring individual points or simple pairs. Such {point- or pair-centric} view not only overlooks the contextual nature of anomalies, which are defined by their deviation from a collective group, but also fails to exploit the rich supervisory signals that can be generated from the combinatorial composition of sets. Consequently, such models struggle to exploit the high-order interactions within the data, which are critical for learning discriminative representations. To address these limitations, we propose SetAD, a novel framework that reframes semi-supervised AD as a Set-level Anomaly Detection task. SetAD employs an attention-based set encoder trained via a graded learning objective, where the model learns to quantify the degree of anomalousness within an entire set. This approach directly models the complex group-level interactions that define anomalies. Furthermore, to enhance robustness and score calibration, we propose a context-calibrated anomaly scoring mechanism, which assesses a point's anomaly score by aggregating its normalized deviations from peer behavior across multiple, diverse contextual sets. Extensive experiments on 10 real-world datasets demonstrate that SetAD significantly outperforms state-of-the-art models. Notably, we show that our model's performance consistently improves with increasing set size, providing strong empirical support for the set-based formulation of anomaly detection.

SetAD: Semi-Supervised Anomaly Learning in Contextual Sets

TL;DR

This work reframes semi-supervised anomaly detection as a set-level graded learning problem, arguing that anomalies are defined by their deviation within a group rather than in isolation. It introduces SetAD, an attention-based set encoder that learns to quantify set-level anomalousness via a graded regression objective and employs a context-calibrated scoring mechanism to stabilize point scores across diverse contexts. Empirical results on 10 real-world datasets show that SetAD outperforms state-of-the-art SSAD methods and that performance improves with larger contextual sets. The paper also provides theoretical analysis of the learned scoring function and the variance-reduction benefits of context normalization, highlighting robustness to contamination and data efficiency under limited labeled anomalies.

Abstract

Semi-supervised anomaly detection (AD) has shown great promise by effectively leveraging limited labeled data. However, existing methods are typically structured around scoring individual points or simple pairs. Such {point- or pair-centric} view not only overlooks the contextual nature of anomalies, which are defined by their deviation from a collective group, but also fails to exploit the rich supervisory signals that can be generated from the combinatorial composition of sets. Consequently, such models struggle to exploit the high-order interactions within the data, which are critical for learning discriminative representations. To address these limitations, we propose SetAD, a novel framework that reframes semi-supervised AD as a Set-level Anomaly Detection task. SetAD employs an attention-based set encoder trained via a graded learning objective, where the model learns to quantify the degree of anomalousness within an entire set. This approach directly models the complex group-level interactions that define anomalies. Furthermore, to enhance robustness and score calibration, we propose a context-calibrated anomaly scoring mechanism, which assesses a point's anomaly score by aggregating its normalized deviations from peer behavior across multiple, diverse contextual sets. Extensive experiments on 10 real-world datasets demonstrate that SetAD significantly outperforms state-of-the-art models. Notably, we show that our model's performance consistently improves with increasing set size, providing strong empirical support for the set-based formulation of anomaly detection.

Paper Structure

This paper contains 20 sections, 7 equations, 4 figures, 3 tables, 2 algorithms.

Figures (4)

  • Figure 1: The proposed SetAD framework. The framework operates in two phases. (Left) A set scoring model is trained via graded regression to predict the number of anomalies in dynamically sampled sets. (Right) During inference, the trained model scores a test point by normalizing its score within shared contexts against the expected score of reference points.
  • Figure 2: AUC-PR w.r.t different set size $k$.
  • Figure 3: AUC-PR w.r.t different contamination rates.
  • Figure 4: AUC-PR w.r.t labeled ratio.