Table of Contents
Fetching ...

Mitigating Homophily Disparity in Graph Anomaly Detection: A Scalable and Adaptive Approach

Yunhui Liu, Qizhuo Xie, Yinfeng Chen, Xudong Jin, Tao Zheng, Bin Chong, Tieke He

TL;DR

SAGAD supports mini-batch training, achieves linear time and space complexity, and drastically reduces memory usage on large-scale graphs, and ensures asymptotic linear separability between normal and abnormal nodes under mild conditions.

Abstract

Graph anomaly detection (GAD) aims to identify nodes that deviate from normal patterns in structure or features. While recent GNN-based approaches have advanced this task, they struggle with two major challenges: 1) homophily disparity, where nodes exhibit varying homophily at both class and node levels; and 2) limited scalability, as many methods rely on costly whole-graph operations. To address them, we propose SAGAD, a Scalable and Adaptive framework for GAD. SAGAD precomputes multi-hop embeddings and applies reparameterized Chebyshev filters to extract low- and high-frequency information, enabling efficient training and capturing both homophilic and heterophilic patterns. To mitigate node-level homophily disparity, we introduce an Anomaly Context-Aware Adaptive Fusion, which adaptively fuses low- and high-pass embeddings using fusion coefficients conditioned on Rayleigh Quotient-guided anomalous subgraph structures for each node. To alleviate class-level disparity, we design a Frequency Preference Guidance Loss, which encourages anomalies to preserve more high-frequency information than normal nodes. SAGAD supports mini-batch training, achieves linear time and space complexity, and drastically reduces memory usage on large-scale graphs. Theoretically, SAGAD ensures asymptotic linear separability between normal and abnormal nodes under mild conditions. Extensive experiments on 10 benchmarks confirm SAGAD's superior accuracy and scalability over state-of-the-art methods.

Mitigating Homophily Disparity in Graph Anomaly Detection: A Scalable and Adaptive Approach

TL;DR

SAGAD supports mini-batch training, achieves linear time and space complexity, and drastically reduces memory usage on large-scale graphs, and ensures asymptotic linear separability between normal and abnormal nodes under mild conditions.

Abstract

Graph anomaly detection (GAD) aims to identify nodes that deviate from normal patterns in structure or features. While recent GNN-based approaches have advanced this task, they struggle with two major challenges: 1) homophily disparity, where nodes exhibit varying homophily at both class and node levels; and 2) limited scalability, as many methods rely on costly whole-graph operations. To address them, we propose SAGAD, a Scalable and Adaptive framework for GAD. SAGAD precomputes multi-hop embeddings and applies reparameterized Chebyshev filters to extract low- and high-frequency information, enabling efficient training and capturing both homophilic and heterophilic patterns. To mitigate node-level homophily disparity, we introduce an Anomaly Context-Aware Adaptive Fusion, which adaptively fuses low- and high-pass embeddings using fusion coefficients conditioned on Rayleigh Quotient-guided anomalous subgraph structures for each node. To alleviate class-level disparity, we design a Frequency Preference Guidance Loss, which encourages anomalies to preserve more high-frequency information than normal nodes. SAGAD supports mini-batch training, achieves linear time and space complexity, and drastically reduces memory usage on large-scale graphs. Theoretically, SAGAD ensures asymptotic linear separability between normal and abnormal nodes under mild conditions. Extensive experiments on 10 benchmarks confirm SAGAD's superior accuracy and scalability over state-of-the-art methods.
Paper Structure (20 sections, 2 theorems, 14 equations, 5 figures, 7 tables)

This paper contains 20 sections, 2 theorems, 14 equations, 5 figures, 7 tables.

Key Result

Lemma 1

The Rayleigh Quotient $RQ(\boldsymbol{x}, \boldsymbol{L})$, which quantifies the accumulated spectral energy of a graph signal, is monotonically increasing with the node's anomaly degree.

Figures (5)

  • Figure 1: (a), (b): Distribution of node homophily on Weibo and T-Finance. (c), (d): Performance disparity across node homophily quartiles (Q1 = top 25% homophily, Q4 = bottom 25%) on Weibo and T-Finance.
  • Figure 2: Overview of our proposed SAGAD. It consists of three main designs: 1) Dual-pass Chebyshev Polynomial Filter extracts both low- and high-frequency embeddings to capture homophilic and heterophilic patterns; 2) Anomaly Context-aware Adaptive Fusion dynamically integrates these embeddings based on node-specific structural contexts; 3) Frequency Preference Guidance Loss regularizes fusion weights to align with class-specific spectral preferences, enhancing anomaly discrimination.
  • Figure 3: Performance disparity across node homophily quartiles (Q1 = top 25% homophily, Q4 = bottom 25%) on Amazon and YelpChi.
  • Figure 4: Visualization of the learned coefficients for the top 8 dimensions. Nodes ($a_1, a_2, a_3$) and ($n_1, n_2, n_3$) are randomly selected abnormal and normal nodes, respectively.
  • Figure 5: How the AUPRC score varies with different values of $p_a$ and $p_n$.

Theorems & Definitions (3)

  • Lemma 1
  • Definition 1: $CSBM(n_a, n_n, \boldsymbol{\mu}, \boldsymbol{\nu}, (p_1,q_1),(p_2,q_2), \boldsymbol{\theta})$
  • Theorem 1