Imbalanced Graph-Level Anomaly Detection via Counterfactual Augmentation and Feature Learning
Zitong Wang, Xuexiong Luo, Enfeng Song, Qiuqing Bai, Fu Lin
TL;DR
This work tackles the imbalance challenge in graph-level anomaly detection by introducing IGAD-CF, which combines counterfactual anomaly sample generation with feature learning that fuses node features and degree attributes. An adaptive weight learning module further tunes feature importance across datasets, while a dedicated loss design ensures balanced training between original and generated anomalies. The approach is validated on eight public GLAD datasets and four brain-graph datasets, achieving state-of-the-art AUC scores and demonstrating robustness via ablations, feature analyses, and visualizations. The findings suggest strong practical impact for reliable GLAD in diverse domains, including biomedical applications, with potential for broad generalization.
Abstract
Graph-level anomaly detection (GLAD) has already gained significant importance and has become a popular field of study, attracting considerable attention across numerous downstream works. The core focus of this domain is to capture and highlight the anomalous information within given graph datasets. In most existing studies, anomalies are often the instances of few. The stark imbalance misleads current GLAD methods to focus on learning the patterns of normal graphs more, further impacting anomaly detection performance. Moreover, existing methods predominantly utilize the inherent features of nodes to identify anomalous graph patterns which is approved suboptimal according to our experiments. In this work, we propose an imbalanced GLAD method via counterfactual augmentation and feature learning. Specifically, we first construct anomalous samples based on counterfactual learning, aiming to expand and balance the datasets. Additionally, we construct a module based on Graph Neural Networks (GNNs), which allows us to utilize degree attributes to complement the inherent attribute features of nodes. Then, we design an adaptive weight learning module to integrate features tailored to different datasets effectively to avoid indiscriminately treating all features as equivalent. Furthermore, extensive baseline experiments conducted on public datasets substantiate the robustness and effectiveness. Besides, we apply the model to brain disease datasets, which can prove the generalization capability of our work. The source code of our work is available online.
