Table of Contents
Fetching ...

Counterfactual Data Augmentation with Denoising Diffusion for Graph Anomaly Detection

Chunjing Xiao, Shikang Pang, Xovee Xu, Xuan Li, Goce Trajcevski, Fan Zhou

TL;DR

CAGAD introduces an unsupervised counterfactual data augmentation framework to improve graph anomaly detection by targeting the neighborhood aggregation of heterophilic nodes. It combines a graph pointer neural network to identify heterophilic nodes with a DDPM-based anomaly generator to translate selected neighbors into anomalous ones, producing counterfactual node representations via a counterfactual GNN. The approach yields measurable improvements over strong baselines across four datasets and remains applicable to test data without labeled anomalies. The work connects to GNNs and causal representation learning by treating neighborhood manipulation as an intervention that promotes invariance and discriminability of anomalous signals. Overall, CAGAD advances graph anomaly detection by leveraging unsupervised counterfactual augmentation to mitigate over-smoothing and class-imbalance effects.

Abstract

A critical aspect of Graph Neural Networks (GNNs) is to enhance the node representations by aggregating node neighborhood information. However, when detecting anomalies, the representations of abnormal nodes are prone to be averaged by normal neighbors, making the learned anomaly representations less distinguishable. To tackle this issue, we propose CAGAD -- an unsupervised Counterfactual data Augmentation method for Graph Anomaly Detection -- which introduces a graph pointer neural network as the heterophilic node detector to identify potential anomalies whose neighborhoods are normal-node-dominant. For each identified potential anomaly, we design a graph-specific diffusion model to translate a part of its neighbors, which are probably normal, into anomalous ones. At last, we involve these translated neighbors in GNN neighborhood aggregation to produce counterfactual representations of anomalies. Through aggregating the translated anomalous neighbors, counterfactual representations become more distinguishable and further advocate detection performance. The experimental results on four datasets demonstrate that CAGAD significantly outperforms strong baselines, with an average improvement of 2.35% on F1, 2.53% on AUC-ROC, and 2.79% on AUC-PR.

Counterfactual Data Augmentation with Denoising Diffusion for Graph Anomaly Detection

TL;DR

CAGAD introduces an unsupervised counterfactual data augmentation framework to improve graph anomaly detection by targeting the neighborhood aggregation of heterophilic nodes. It combines a graph pointer neural network to identify heterophilic nodes with a DDPM-based anomaly generator to translate selected neighbors into anomalous ones, producing counterfactual node representations via a counterfactual GNN. The approach yields measurable improvements over strong baselines across four datasets and remains applicable to test data without labeled anomalies. The work connects to GNNs and causal representation learning by treating neighborhood manipulation as an intervention that promotes invariance and discriminability of anomalous signals. Overall, CAGAD advances graph anomaly detection by leveraging unsupervised counterfactual augmentation to mitigate over-smoothing and class-imbalance effects.

Abstract

A critical aspect of Graph Neural Networks (GNNs) is to enhance the node representations by aggregating node neighborhood information. However, when detecting anomalies, the representations of abnormal nodes are prone to be averaged by normal neighbors, making the learned anomaly representations less distinguishable. To tackle this issue, we propose CAGAD -- an unsupervised Counterfactual data Augmentation method for Graph Anomaly Detection -- which introduces a graph pointer neural network as the heterophilic node detector to identify potential anomalies whose neighborhoods are normal-node-dominant. For each identified potential anomaly, we design a graph-specific diffusion model to translate a part of its neighbors, which are probably normal, into anomalous ones. At last, we involve these translated neighbors in GNN neighborhood aggregation to produce counterfactual representations of anomalies. Through aggregating the translated anomalous neighbors, counterfactual representations become more distinguishable and further advocate detection performance. The experimental results on four datasets demonstrate that CAGAD significantly outperforms strong baselines, with an average improvement of 2.35% on F1, 2.53% on AUC-ROC, and 2.79% on AUC-PR.
Paper Structure (24 sections, 21 equations, 9 figures, 3 tables)

This paper contains 24 sections, 21 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Nodes with heterophily-dominant neighbors.
  • Figure 2: Sketch of CAGAD. Heterophilic nodes are defined as most of their neighbors have different properties or labels from themselves.
  • Figure 3: Anomalous generator $G_\text{ano}$: red arrows $\rightarrow$ indicate forward diffusion, blue ones $\rightarrow$ refer to the reverse diffusion; $\oplus$ is the concatenation operation.
  • Figure 4: Original GNN vs. Counterfactual GNN.
  • Figure 5: Ablation study results on four datasets.
  • ...and 4 more figures