Table of Contents
Fetching ...

Generative Semi-supervised Graph Anomaly Detection

Hezhe Qiao, Qingsong Wen, Xiaoli Li, Ee-Peng Lim, Guansong Pang

TL;DR

GGAD is designed to generate reliable outlier nodes that assimilate anomaly nodes in both graph structure and feature representations that substantially outperforms state-of-the-art unsupervised and semi-supervised GAD methods with varying numbers of training normal nodes.

Abstract

This work considers a practical semi-supervised graph anomaly detection (GAD) scenario, where part of the nodes in a graph are known to be normal, contrasting to the extensively explored unsupervised setting with a fully unlabeled graph. We reveal that having access to the normal nodes, even just a small percentage of normal nodes, helps enhance the detection performance of existing unsupervised GAD methods when they are adapted to the semi-supervised setting. However, their utilization of these normal nodes is limited. In this paper, we propose a novel Generative GAD approach (namely GGAD) for the semi-supervised scenario to better exploit the normal nodes. The key idea is to generate pseudo anomaly nodes, referred to as 'outlier nodes', for providing effective negative node samples in training a discriminative one-class classifier. The main challenge here lies in the lack of ground truth information about real anomaly nodes. To address this challenge, GGAD is designed to leverage two important priors about the anomaly nodes -- asymmetric local affinity and egocentric closeness -- to generate reliable outlier nodes that assimilate anomaly nodes in both graph structure and feature representations. Comprehensive experiments on six real-world GAD datasets are performed to establish a benchmark for semi-supervised GAD and show that GGAD substantially outperforms state-of-the-art unsupervised and semi-supervised GAD methods with varying numbers of training normal nodes. Code will be made available at https://github.com/mala-lab/GGAD.

Generative Semi-supervised Graph Anomaly Detection

TL;DR

GGAD is designed to generate reliable outlier nodes that assimilate anomaly nodes in both graph structure and feature representations that substantially outperforms state-of-the-art unsupervised and semi-supervised GAD methods with varying numbers of training normal nodes.

Abstract

This work considers a practical semi-supervised graph anomaly detection (GAD) scenario, where part of the nodes in a graph are known to be normal, contrasting to the extensively explored unsupervised setting with a fully unlabeled graph. We reveal that having access to the normal nodes, even just a small percentage of normal nodes, helps enhance the detection performance of existing unsupervised GAD methods when they are adapted to the semi-supervised setting. However, their utilization of these normal nodes is limited. In this paper, we propose a novel Generative GAD approach (namely GGAD) for the semi-supervised scenario to better exploit the normal nodes. The key idea is to generate pseudo anomaly nodes, referred to as 'outlier nodes', for providing effective negative node samples in training a discriminative one-class classifier. The main challenge here lies in the lack of ground truth information about real anomaly nodes. To address this challenge, GGAD is designed to leverage two important priors about the anomaly nodes -- asymmetric local affinity and egocentric closeness -- to generate reliable outlier nodes that assimilate anomaly nodes in both graph structure and feature representations. Comprehensive experiments on six real-world GAD datasets are performed to establish a benchmark for semi-supervised GAD and show that GGAD substantially outperforms state-of-the-art unsupervised and semi-supervised GAD methods with varying numbers of training normal nodes. Code will be made available at https://github.com/mala-lab/GGAD.
Paper Structure (27 sections, 8 equations, 11 figures, 5 tables, 2 algorithms)

This paper contains 27 sections, 8 equations, 11 figures, 5 tables, 2 algorithms.

Figures (11)

  • Figure 1: Left: An exemplar graph with the edge width indicates the level of affinity connecting two nodes, in which normal nodes (e.g., $v_{n_i}$ and $v_{n_j}$) have stronger affinity to its neighboring normal nodes than anomaly nodes (e.g., $v_{a_i}$ and $v_{a_j}$) due to homophily relation within the normal class. Our approach GGAD aims to generate outliers (e.g., $v_{o_i}$ and $v_{o_j}$) that can well assimilate the anomaly nodes. Right: The outliers generated by methods like AEGIS ding2021inductive that ignore their structural relation often mismatch the distribution of abnormal nodes (a), due to their false local affinity (c). By contrast, GGAD incorporates two important priors about anomaly nodes to generate outliers so that they well assimilate the (b) feature representation and (d) local structure of abnormal nodes.
  • Figure 2: Overview of GGAD. (a) It first initializes the outlier nodes based on the feature representations of the ego network of a labeled normal node. We then incorporate the two anomaly node priors (b-c) to optimize the outlier nodes so that they are well aligned to the anomalies. (d) The resulting generated outlier nodes are treated as negative samples to train a discriminative one-class classifier.
  • Figure 3: (a-c) t-SNE visualization of the node representations and (d-f) histograms of local affinity yielded by GGAD and its two variants on a GAD dataset T-Finance tang2022rethinking.
  • Figure 4: AUPRC results w.r.t the size of training normal nodes ($R$% of $|\mathcal{V}|$). 'Baseline' denotes the performance of the best unsupervised GAD method.
  • Figure 5: AUPRC w.r.t. contamination.
  • ...and 6 more figures