Table of Contents
Fetching ...

Normality Calibration in Semi-supervised Graph Anomaly Detection

Guolei Zeng, Hezhe Qiao, Guoguo Ai, Jinsong Guo, Guansong Pang

TL;DR

GraphNC addresses overfitting in semi-supervised graph anomaly detection by a teacher-student calibration framework that jointly optimizes anomaly score alignment and representation consistency. ScoreDA aligns the student’s anomaly scores with a pre-trained teacher’s score distribution, while NormReg enforces perturbation-based consistency on labeled normals to mitigate teacher inaccuracies. The method is model-agnostic with respect to the teacher and demonstrates strong gains across six datasets, achieving notable improvements in AUROC and AUPRC over strong baselines. This work offers a practical, plug-in approach to improve generalization in semi-supervised GAD by leveraging both score-space and feature-space calibrations.

Abstract

Graph anomaly detection (GAD) has attracted growing interest for its crucial ability to uncover irregular patterns in broad applications. Semi-supervised GAD, which assumes a subset of annotated normal nodes available during training, is among the most widely explored application settings. However, the normality learned by existing semi-supervised GAD methods is limited to the labeled normal nodes, often inclining to overfitting the given patterns. These can lead to high detection errors, such as high false positives. To overcome this limitation, we propose GraphNC , a graph normality calibration framework that leverages both labeled and unlabeled data to calibrate the normality from a teacher model (a pre-trained semi-supervised GAD model) jointly in anomaly score and node representation spaces. GraphNC includes two main components, anomaly score distribution alignment (ScoreDA) and perturbation-based normality regularization (NormReg). ScoreDA optimizes the anomaly scores of our model by aligning them with the score distribution yielded by the teacher model. Due to accurate scores in most of the normal nodes and part of the anomaly nodes in the teacher model, the score alignment effectively pulls the anomaly scores of the normal and abnormal classes toward the two ends, resulting in more separable anomaly scores. Nevertheless, there are inaccurate scores from the teacher model. To mitigate the misleading by these scores, NormReg is designed to regularize the graph normality in the representation space, making the representations of normal nodes more compact by minimizing a perturbation-guided consistency loss solely on the labeled nodes.

Normality Calibration in Semi-supervised Graph Anomaly Detection

TL;DR

GraphNC addresses overfitting in semi-supervised graph anomaly detection by a teacher-student calibration framework that jointly optimizes anomaly score alignment and representation consistency. ScoreDA aligns the student’s anomaly scores with a pre-trained teacher’s score distribution, while NormReg enforces perturbation-based consistency on labeled normals to mitigate teacher inaccuracies. The method is model-agnostic with respect to the teacher and demonstrates strong gains across six datasets, achieving notable improvements in AUROC and AUPRC over strong baselines. This work offers a practical, plug-in approach to improve generalization in semi-supervised GAD by leveraging both score-space and feature-space calibrations.

Abstract

Graph anomaly detection (GAD) has attracted growing interest for its crucial ability to uncover irregular patterns in broad applications. Semi-supervised GAD, which assumes a subset of annotated normal nodes available during training, is among the most widely explored application settings. However, the normality learned by existing semi-supervised GAD methods is limited to the labeled normal nodes, often inclining to overfitting the given patterns. These can lead to high detection errors, such as high false positives. To overcome this limitation, we propose GraphNC , a graph normality calibration framework that leverages both labeled and unlabeled data to calibrate the normality from a teacher model (a pre-trained semi-supervised GAD model) jointly in anomaly score and node representation spaces. GraphNC includes two main components, anomaly score distribution alignment (ScoreDA) and perturbation-based normality regularization (NormReg). ScoreDA optimizes the anomaly scores of our model by aligning them with the score distribution yielded by the teacher model. Due to accurate scores in most of the normal nodes and part of the anomaly nodes in the teacher model, the score alignment effectively pulls the anomaly scores of the normal and abnormal classes toward the two ends, resulting in more separable anomaly scores. Nevertheless, there are inaccurate scores from the teacher model. To mitigate the misleading by these scores, NormReg is designed to regularize the graph normality in the representation space, making the representations of normal nodes more compact by minimizing a perturbation-guided consistency loss solely on the labeled nodes.

Paper Structure

This paper contains 22 sections, 2 theorems, 14 equations, 6 figures, 5 tables, 1 algorithm.

Key Result

Theorem 1

Let $\sigma_\mathcal{T}^2$ and $\sigma_\mathcal{S}^2$ denote the variance of the anomaly scores yielded by the teacher model and the student model in the normal class, respectively, then under our GraphNC framework, the student model achieves shrinking score variance (i.e., $\sigma_\mathcal{S}^2 < \

Figures (6)

  • Figure 1: (a) False positive rate and (b) false negative rate results on Amazon dou2020enhancing and Tolokers mcauley2015image. (c), (d), and (e) show the score distributions of normal and abnormal nodes for GGAD, ScoreDA, and ScoreDA+NormReg (i.e., GraphNC) on Amazon, where GGAD is used as a teacher model in both ScoreDA and GraphNC.
  • Figure 2: Overview of GraphNC. The input graph consists of a small labeled normal node set and a large unlabeled node set. GraphNC is based on a teacher-student network framework. ScoreDA aims to align the anomaly scores of the student network with the anomaly scores yielded by an existing semi-supervised GAD method to calibrate the normality in the score space. NormReg, on the other hand, is introduced to utilize a consistency regularization loss function in representation space to learn more compact representations of normal nodes, thereby reducing the negative impact of inaccurate anomaly scores that may be produced by the teacher model. The teacher network is pre-trained first, and it is frozen afterward, with only the student network being trained when jointly optimizing ScoreDA and NormReg.
  • Figure 3: (a) and (b) provide t-SNE visualization of the node representations for GraphNC with/without using NormReg. (c) The average deviation of the normal class on Tolokers.
  • Figure 4: (a-d) AUROC and AURPC results w.r.t $\alpha$ and $\omega$. (e-f) AUROC results w.r.t $R$ on Photo and Reddit.
  • Figure 5: The score distribution of DOMINANT along the corresponding NomrDR enabled DOMINANT on Amazon dou2020enhancing
  • ...and 1 more figures

Theorems & Definitions (3)

  • Theorem 1
  • Theorem 1
  • proof