Table of Contents
Fetching ...

Imbalanced Graph-Level Anomaly Detection via Counterfactual Augmentation and Feature Learning

Zitong Wang, Xuexiong Luo, Enfeng Song, Qiuqing Bai, Fu Lin

TL;DR

This work tackles the imbalance challenge in graph-level anomaly detection by introducing IGAD-CF, which combines counterfactual anomaly sample generation with feature learning that fuses node features and degree attributes. An adaptive weight learning module further tunes feature importance across datasets, while a dedicated loss design ensures balanced training between original and generated anomalies. The approach is validated on eight public GLAD datasets and four brain-graph datasets, achieving state-of-the-art AUC scores and demonstrating robustness via ablations, feature analyses, and visualizations. The findings suggest strong practical impact for reliable GLAD in diverse domains, including biomedical applications, with potential for broad generalization.

Abstract

Graph-level anomaly detection (GLAD) has already gained significant importance and has become a popular field of study, attracting considerable attention across numerous downstream works. The core focus of this domain is to capture and highlight the anomalous information within given graph datasets. In most existing studies, anomalies are often the instances of few. The stark imbalance misleads current GLAD methods to focus on learning the patterns of normal graphs more, further impacting anomaly detection performance. Moreover, existing methods predominantly utilize the inherent features of nodes to identify anomalous graph patterns which is approved suboptimal according to our experiments. In this work, we propose an imbalanced GLAD method via counterfactual augmentation and feature learning. Specifically, we first construct anomalous samples based on counterfactual learning, aiming to expand and balance the datasets. Additionally, we construct a module based on Graph Neural Networks (GNNs), which allows us to utilize degree attributes to complement the inherent attribute features of nodes. Then, we design an adaptive weight learning module to integrate features tailored to different datasets effectively to avoid indiscriminately treating all features as equivalent. Furthermore, extensive baseline experiments conducted on public datasets substantiate the robustness and effectiveness. Besides, we apply the model to brain disease datasets, which can prove the generalization capability of our work. The source code of our work is available online.

Imbalanced Graph-Level Anomaly Detection via Counterfactual Augmentation and Feature Learning

TL;DR

This work tackles the imbalance challenge in graph-level anomaly detection by introducing IGAD-CF, which combines counterfactual anomaly sample generation with feature learning that fuses node features and degree attributes. An adaptive weight learning module further tunes feature importance across datasets, while a dedicated loss design ensures balanced training between original and generated anomalies. The approach is validated on eight public GLAD datasets and four brain-graph datasets, achieving state-of-the-art AUC scores and demonstrating robustness via ablations, feature analyses, and visualizations. The findings suggest strong practical impact for reliable GLAD in diverse domains, including biomedical applications, with potential for broad generalization.

Abstract

Graph-level anomaly detection (GLAD) has already gained significant importance and has become a popular field of study, attracting considerable attention across numerous downstream works. The core focus of this domain is to capture and highlight the anomalous information within given graph datasets. In most existing studies, anomalies are often the instances of few. The stark imbalance misleads current GLAD methods to focus on learning the patterns of normal graphs more, further impacting anomaly detection performance. Moreover, existing methods predominantly utilize the inherent features of nodes to identify anomalous graph patterns which is approved suboptimal according to our experiments. In this work, we propose an imbalanced GLAD method via counterfactual augmentation and feature learning. Specifically, we first construct anomalous samples based on counterfactual learning, aiming to expand and balance the datasets. Additionally, we construct a module based on Graph Neural Networks (GNNs), which allows us to utilize degree attributes to complement the inherent attribute features of nodes. Then, we design an adaptive weight learning module to integrate features tailored to different datasets effectively to avoid indiscriminately treating all features as equivalent. Furthermore, extensive baseline experiments conducted on public datasets substantiate the robustness and effectiveness. Besides, we apply the model to brain disease datasets, which can prove the generalization capability of our work. The source code of our work is available online.
Paper Structure (28 sections, 23 equations, 4 figures, 6 tables)

This paper contains 28 sections, 23 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: The graph-level representation distributions of the normal samples, abnormal samples, and generated samples from counterfactual theory on COX2 dataset pang2019deepmorris2020tudataset.
  • Figure 2: The framework of the IGAD-CF. The anomaly sample generation module employs counterfactual learning to perturb the adjacency matrices and feature matrices of normal samples. This module guarantees the efficacy and reliability of the generated samples. The node feature learning module combines the node features and degree attributes, capturing richer and more complete information. The adaptive weight learning module utilizes an adaptive weight matrix, ensuring that each feature is assigned a weight commensurate with its importance.
  • Figure 3: A visualization analysis for IGAD-CF's and selected baselines' performance on the selected datasets, where the pink and grey represent abnormal samples and normal samples. When the pink is concentrated on the right side of the axis, and the grey is concentrated on the left side of the axis, it indicates that the model is better can classify the samples better.
  • Figure 4: Parametric experiments on hyperparametric $\beta$. It represents the weights of our expanded samples in participating during the overall training.