A Label-Free Heterophily-Guided Approach for Unsupervised Graph Fraud Detection
Junjun Pan, Yixin Liu, Xin Zheng, Yizhen Zheng, Alan Wee-Chung Liew, Fuyi Li, Shirui Pan
TL;DR
This work tackles unsupervised graph fraud detection under heterophily by introducing HALO, a label-free heterophily metric, and HUGE, a two-component framework that pairs HALO-guided ranking with an asymmetric alignment loss to transfer structural signals from a GNN into an MLP encoder. HALO provides robust, bounded estimates of heterophily from node attributes, and the alignment-based detector learns local inconsistency scores while respecting relative heterophily orders. Across six real-world datasets, HUGE demonstrates strong and robust detection performance with favorable scalability, outperforming several state-of-the-art unsupervised GAD methods and several ablations validate the importance of HALO and the alignment mechanism. The approach offers a practical, label-free pathway to combat fraud in large-scale graph-based online services, where labels are scarce or unavailable.
Abstract
Graph fraud detection (GFD) has rapidly advanced in protecting online services by identifying malicious fraudsters. Recent supervised GFD research highlights that heterophilic connections between fraudsters and users can greatly impact detection performance, since fraudsters tend to camouflage themselves by building more connections to benign users. Despite the promising performance of supervised GFD methods, the reliance on labels limits their applications to unsupervised scenarios; Additionally, accurately capturing complex and diverse heterophily patterns without labels poses a further challenge. To fill the gap, we propose a Heterophily-guided Unsupervised Graph fraud dEtection approach (HUGE) for unsupervised GFD, which contains two essential components: a heterophily estimation module and an alignment-based fraud detection module. In the heterophily estimation module, we design a novel label-free heterophily metric called HALO, which captures the critical graph properties for GFD, enabling its outstanding ability to estimate heterophily from node attributes. In the alignment-based fraud detection module, we develop a joint MLP-GNN architecture with ranking loss and asymmetric alignment loss. The ranking loss aligns the predicted fraud score with the relative order of HALO, providing an extra robustness guarantee by comparing heterophily among non-adjacent nodes. Moreover, the asymmetric alignment loss effectively utilizes structural information while alleviating the feature-smooth effects of GNNs. Extensive experiments on 6 datasets demonstrate that HUGE significantly outperforms competitors, showcasing its effectiveness and robustness.
