Table of Contents
Fetching ...

Balanced Anomaly-guided Ego-graph Diffusion Model for Inductive Graph Anomaly Detection

Chunyu Wei, Siyuan He, Yu Wang, Yueguo Chen, Yunhai Wang, Bing Bai, Yidong Zhang, Yong Xie, Shunming Zhang, Fei Wang

TL;DR

BAED tackles inductive graph anomaly detection under dynamic graphs and extreme label imbalance. It introduces a discrete ego-graph diffusion model to synthesize anomaly-focused ego-graphs and a curriculum anomaly augmentation mechanism to adapt generation during training. The approach achieves state-of-the-art results across five large-scale datasets, with notable gains on sparse or evolving graphs and improved data diversity. The combination of topology-aware data synthesis and adaptive learning offers practical benefits for fraud detection and cybersecurity in real-world networks.

Abstract

Graph anomaly detection (GAD) is crucial in applications like fraud detection and cybersecurity. Despite recent advancements using graph neural networks (GNNs), two major challenges persist. At the model level, most methods adopt a transductive learning paradigm, which assumes static graph structures, making them unsuitable for dynamic, evolving networks. At the data level, the extreme class imbalance, where anomalous nodes are rare, leads to biased models that fail to generalize to unseen anomalies. These challenges are interdependent: static transductive frameworks limit effective data augmentation, while imbalance exacerbates model distortion in inductive learning settings. To address these challenges, we propose a novel data-centric framework that integrates dynamic graph modeling with balanced anomaly synthesis. Our framework features: (1) a discrete ego-graph diffusion model, which captures the local topology of anomalies to generate ego-graphs aligned with anomalous structural distribution, and (2) a curriculum anomaly augmentation mechanism, which dynamically adjusts synthetic data generation during training, focusing on underrepresented anomaly patterns to improve detection and generalization. Experiments on five datasets demonstrate that the effectiveness of our framework.

Balanced Anomaly-guided Ego-graph Diffusion Model for Inductive Graph Anomaly Detection

TL;DR

BAED tackles inductive graph anomaly detection under dynamic graphs and extreme label imbalance. It introduces a discrete ego-graph diffusion model to synthesize anomaly-focused ego-graphs and a curriculum anomaly augmentation mechanism to adapt generation during training. The approach achieves state-of-the-art results across five large-scale datasets, with notable gains on sparse or evolving graphs and improved data diversity. The combination of topology-aware data synthesis and adaptive learning offers practical benefits for fraud detection and cybersecurity in real-world networks.

Abstract

Graph anomaly detection (GAD) is crucial in applications like fraud detection and cybersecurity. Despite recent advancements using graph neural networks (GNNs), two major challenges persist. At the model level, most methods adopt a transductive learning paradigm, which assumes static graph structures, making them unsuitable for dynamic, evolving networks. At the data level, the extreme class imbalance, where anomalous nodes are rare, leads to biased models that fail to generalize to unseen anomalies. These challenges are interdependent: static transductive frameworks limit effective data augmentation, while imbalance exacerbates model distortion in inductive learning settings. To address these challenges, we propose a novel data-centric framework that integrates dynamic graph modeling with balanced anomaly synthesis. Our framework features: (1) a discrete ego-graph diffusion model, which captures the local topology of anomalies to generate ego-graphs aligned with anomalous structural distribution, and (2) a curriculum anomaly augmentation mechanism, which dynamically adjusts synthetic data generation during training, focusing on underrepresented anomaly patterns to improve detection and generalization. Experiments on five datasets demonstrate that the effectiveness of our framework.
Paper Structure (35 sections, 4 theorems, 24 equations, 6 figures, 5 tables, 3 algorithms)

This paper contains 35 sections, 4 theorems, 24 equations, 6 figures, 5 tables, 3 algorithms.

Key Result

Proposition 4.1

The magnitude of the deviation $\|h_{\mathcal{G}_{K}^{i}}\|_2$ increases as the central node $v_i$ exhibits abnormal behavior compared to its local neighborhood. Formally, given the ego-graph $\mathcal{G}_{K}^{i}$, the deviation satisfies: where $\delta(v_i, \mathcal{N}^{K}(i))$ denotes the degree of deviation of $v_i$ from the distribution of its neighbors $\mathcal{N}^{K}(i)$.

Figures (6)

  • Figure 1: (a) Vanilla GAD paradigm: transductive learning is performed on the whole graph with imbalanced labels. (b) Graph-level data augmentation: fixed preprocessing that require retraining the model whenever new samples are introduced, limiting adaptability during the training process. (c) Inductive GAD on ego-graphs: inductive learning on ego-graphs inherits the label imbalance issue from the original graph. (d) Dynamic and balanced ego-graph augmentation: our proposed inductive framework dynamically adjusts the type and ratio of generated samples during training to address imbalance and enhance model adaptability in real-time.
  • Figure 2: The BAED framework. Left: Pre-training the discrete ego-graph diffusion model with forward noise addition and reverse denoising processes. Right: Training iterations where the anomaly detection model processes both original imbalanced batch and augmented samples. The Guidance Embedding Generator (GIN) encodes anomalous ego-graphs into guidance embeddings, which are dynamically weighted based on previous losses to focus on underrepresented anomaly types.
  • Figure 3: Impact of Anomaly-Guidance Embedding.
  • Figure 4: Comparison of model loss curves for different weighting strategy on reddit dataset
  • Figure 5: Comparison of different augmentation methods
  • ...and 1 more figures

Theorems & Definitions (4)

  • Proposition 4.1
  • Proposition 4.2
  • theorem 1: Optimal Decision Boundary under Imbalance
  • Proposition 4.3: BAED's Balancing Effect