Table of Contents
Fetching ...

Truncated Affinity Maximization: One-class Homophily Modeling for Graph Anomaly Detection

Hezhe Qiao, Guansong Pang

TL;DR

This work identifies a one-class homophily phenomenon in GAD, where normal nodes exhibit stronger mutual affinity than anomalous ones. It proposes local node affinity as an unsupervised anomaly score and introduces Truncated Affinity Maximization (TAM), a graph neural network framework built from Local Affinity Maximization networks (LAMNet) and Normal Structure-preserved Graph Truncation (NSGT). TAM learns tailored node representations by maximizing local affinity on progressively truncated graphs and ensembles multiple LAMNets to produce robust anomaly scores. Across 10 real-world datasets, TAM substantially outperforms seven competing methods, validating the approach and highlighting the practical impact of leveraging one-class homophily for effective GAD.

Abstract

We reveal a one-class homophily phenomenon, which is one prevalent property we find empirically in real-world graph anomaly detection (GAD) datasets, i.e., normal nodes tend to have strong connection/affinity with each other, while the homophily in abnormal nodes is significantly weaker than normal nodes. However, this anomaly-discriminative property is ignored by existing GAD methods that are typically built using a conventional anomaly detection objective, such as data reconstruction. In this work, we explore this property to introduce a novel unsupervised anomaly scoring measure for GAD, local node affinity, that assigns a larger anomaly score to nodes that are less affiliated with their neighbors, with the affinity defined as similarity on node attributes/representations. We further propose Truncated Affinity Maximization (TAM) that learns tailored node representations for our anomaly measure by maximizing the local affinity of nodes to their neighbors. Optimizing on the original graph structure can be biased by nonhomophily edges (i.e., edges connecting normal and abnormal nodes). Thus, TAM is instead optimized on truncated graphs where non-homophily edges are removed iteratively to mitigate this bias. The learned representations result in significantly stronger local affinity for normal nodes than abnormal nodes. Extensive empirical results on 10 real-world GAD datasets show that TAM substantially outperforms seven competing models, achieving over 10% increase in AUROC/AUPRC compared to the best contenders on challenging datasets. Our code is available at https://github.com/mala-lab/TAM-master/.

Truncated Affinity Maximization: One-class Homophily Modeling for Graph Anomaly Detection

TL;DR

This work identifies a one-class homophily phenomenon in GAD, where normal nodes exhibit stronger mutual affinity than anomalous ones. It proposes local node affinity as an unsupervised anomaly score and introduces Truncated Affinity Maximization (TAM), a graph neural network framework built from Local Affinity Maximization networks (LAMNet) and Normal Structure-preserved Graph Truncation (NSGT). TAM learns tailored node representations by maximizing local affinity on progressively truncated graphs and ensembles multiple LAMNets to produce robust anomaly scores. Across 10 real-world datasets, TAM substantially outperforms seven competing methods, validating the approach and highlighting the practical impact of leveraging one-class homophily for effective GAD.

Abstract

We reveal a one-class homophily phenomenon, which is one prevalent property we find empirically in real-world graph anomaly detection (GAD) datasets, i.e., normal nodes tend to have strong connection/affinity with each other, while the homophily in abnormal nodes is significantly weaker than normal nodes. However, this anomaly-discriminative property is ignored by existing GAD methods that are typically built using a conventional anomaly detection objective, such as data reconstruction. In this work, we explore this property to introduce a novel unsupervised anomaly scoring measure for GAD, local node affinity, that assigns a larger anomaly score to nodes that are less affiliated with their neighbors, with the affinity defined as similarity on node attributes/representations. We further propose Truncated Affinity Maximization (TAM) that learns tailored node representations for our anomaly measure by maximizing the local affinity of nodes to their neighbors. Optimizing on the original graph structure can be biased by nonhomophily edges (i.e., edges connecting normal and abnormal nodes). Thus, TAM is instead optimized on truncated graphs where non-homophily edges are removed iteratively to mitigate this bias. The learned representations result in significantly stronger local affinity for normal nodes than abnormal nodes. Extensive empirical results on 10 real-world GAD datasets show that TAM substantially outperforms seven competing models, achieving over 10% increase in AUROC/AUPRC compared to the best contenders on challenging datasets. Our code is available at https://github.com/mala-lab/TAM-master/.
Paper Structure (24 sections, 11 equations, 10 figures, 11 tables, 2 algorithms)

This paper contains 24 sections, 11 equations, 10 figures, 11 tables, 2 algorithms.

Figures (10)

  • Figure 1: (a) Homophily and (b) local affinity distributions of normal and abnormal nodes on two popular benchmarks, BlogCatalog tang2009relational and Amazon dou2020enhancing. The homophily of a given node is calculated using the number of nodes that have the same class label as the given node gao2023alleviating. The local affinity is calculated on raw attributes (RA) and node representations learned by DGI velickovic2019deep and TAM, respectively.
  • Figure 2: Overview of TAM. (a) TAM leverages the observation that normal nodes have stronger affinity relations to their neighbors than anomalies to learn an unsupervised GAD model. It learns a set of affinity maximization GNNs (i.e., LAMNet) on a set of sequentially truncated graphs yielded by our probabilistic graph truncation method NSGT. We build an ensemble of TAM models to make use of the randomness in NSGT for more effective GAD. (b) NSGT iteratively removes edges with a probability proportional to the distance between the connected nodes.
  • Figure 3: (a) and (b) are respectively the Euclidean distance statistics of the homophily (N-N) edges that connect normal nodes and the non-homophily (N-A) edges that connect normal and abnormal nodes on BlogCatalog and Amazon. (c) Homophily of normal nodes vs. (d) the number of non-homophily edges with increasing truncation iterations/depths.
  • Figure 4: (a) TAM vs. Degree and TAM-T. (b) TAM results w.r.t. graph truncation depth $K$.
  • Figure 5: Homophily distribution of normal nodes and abnormal nodes on the rest of eight datasets
  • ...and 5 more figures