Table of Contents
Fetching ...

A Label-Free Heterophily-Guided Approach for Unsupervised Graph Fraud Detection

Junjun Pan, Yixin Liu, Xin Zheng, Yizhen Zheng, Alan Wee-Chung Liew, Fuyi Li, Shirui Pan

TL;DR

This work tackles unsupervised graph fraud detection under heterophily by introducing HALO, a label-free heterophily metric, and HUGE, a two-component framework that pairs HALO-guided ranking with an asymmetric alignment loss to transfer structural signals from a GNN into an MLP encoder. HALO provides robust, bounded estimates of heterophily from node attributes, and the alignment-based detector learns local inconsistency scores while respecting relative heterophily orders. Across six real-world datasets, HUGE demonstrates strong and robust detection performance with favorable scalability, outperforming several state-of-the-art unsupervised GAD methods and several ablations validate the importance of HALO and the alignment mechanism. The approach offers a practical, label-free pathway to combat fraud in large-scale graph-based online services, where labels are scarce or unavailable.

Abstract

Graph fraud detection (GFD) has rapidly advanced in protecting online services by identifying malicious fraudsters. Recent supervised GFD research highlights that heterophilic connections between fraudsters and users can greatly impact detection performance, since fraudsters tend to camouflage themselves by building more connections to benign users. Despite the promising performance of supervised GFD methods, the reliance on labels limits their applications to unsupervised scenarios; Additionally, accurately capturing complex and diverse heterophily patterns without labels poses a further challenge. To fill the gap, we propose a Heterophily-guided Unsupervised Graph fraud dEtection approach (HUGE) for unsupervised GFD, which contains two essential components: a heterophily estimation module and an alignment-based fraud detection module. In the heterophily estimation module, we design a novel label-free heterophily metric called HALO, which captures the critical graph properties for GFD, enabling its outstanding ability to estimate heterophily from node attributes. In the alignment-based fraud detection module, we develop a joint MLP-GNN architecture with ranking loss and asymmetric alignment loss. The ranking loss aligns the predicted fraud score with the relative order of HALO, providing an extra robustness guarantee by comparing heterophily among non-adjacent nodes. Moreover, the asymmetric alignment loss effectively utilizes structural information while alleviating the feature-smooth effects of GNNs. Extensive experiments on 6 datasets demonstrate that HUGE significantly outperforms competitors, showcasing its effectiveness and robustness.

A Label-Free Heterophily-Guided Approach for Unsupervised Graph Fraud Detection

TL;DR

This work tackles unsupervised graph fraud detection under heterophily by introducing HALO, a label-free heterophily metric, and HUGE, a two-component framework that pairs HALO-guided ranking with an asymmetric alignment loss to transfer structural signals from a GNN into an MLP encoder. HALO provides robust, bounded estimates of heterophily from node attributes, and the alignment-based detector learns local inconsistency scores while respecting relative heterophily orders. Across six real-world datasets, HUGE demonstrates strong and robust detection performance with favorable scalability, outperforming several state-of-the-art unsupervised GAD methods and several ablations validate the importance of HALO and the alignment mechanism. The approach offers a practical, label-free pathway to combat fraud in large-scale graph-based online services, where labels are scarce or unavailable.

Abstract

Graph fraud detection (GFD) has rapidly advanced in protecting online services by identifying malicious fraudsters. Recent supervised GFD research highlights that heterophilic connections between fraudsters and users can greatly impact detection performance, since fraudsters tend to camouflage themselves by building more connections to benign users. Despite the promising performance of supervised GFD methods, the reliance on labels limits their applications to unsupervised scenarios; Additionally, accurately capturing complex and diverse heterophily patterns without labels poses a further challenge. To fill the gap, we propose a Heterophily-guided Unsupervised Graph fraud dEtection approach (HUGE) for unsupervised GFD, which contains two essential components: a heterophily estimation module and an alignment-based fraud detection module. In the heterophily estimation module, we design a novel label-free heterophily metric called HALO, which captures the critical graph properties for GFD, enabling its outstanding ability to estimate heterophily from node attributes. In the alignment-based fraud detection module, we develop a joint MLP-GNN architecture with ranking loss and asymmetric alignment loss. The ranking loss aligns the predicted fraud score with the relative order of HALO, providing an extra robustness guarantee by comparing heterophily among non-adjacent nodes. Moreover, the asymmetric alignment loss effectively utilizes structural information while alleviating the feature-smooth effects of GNNs. Extensive experiments on 6 datasets demonstrate that HUGE significantly outperforms competitors, showcasing its effectiveness and robustness.

Paper Structure

This paper contains 25 sections, 7 theorems, 18 equations, 3 figures, 7 tables, 1 algorithm.

Key Result

Theorem 1

Euclidean distance is unbounded. It satisfies minimal agreement, monotonicity and equal attribute tolerance.

Figures (3)

  • Figure 1: Concept maps of existing GFD/GAD methods and our proposed unsupervised GFD method HUGE.
  • Figure 2: Workflow of HUGE. Our label-free heterophily estimation module estimates node heterophily using attributes. Then, in the alignment-based unsupervised GFD module, a joint MLP-GNN architecture is trained through ranking and asymmetric alignment losses. The ranking loss ensures the predicted inconsistency score aligns with the heterophily order, while the asymmetric alignment loss matches the neighbor inconsistency distribution of the MLP encoder to that of the GNN encoder. In evaluation phase, the local inconsistency scores generated by the MLP encoder are used as the final fraud scores.
  • Figure 3: Visualization of results. (a) Parameter sensitivity w.r.t. $\alpha$ of the proposed HUGE. (b) Fraud score visualization on AmazonFull dataset. (c) AUROC and training time of HUGE and baselines on AmazonFull dataset.

Theorems & Definitions (14)

  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • Theorem 3
  • proof
  • Theorem 4: Boundedness
  • proof
  • Theorem 5: Minimal Agreement
  • proof
  • ...and 4 more