Table of Contents
Fetching ...

Crowdsourcing Fraud Detection over Heterogeneous Temporal MMMA Graph

Zequan Xu, Qihang Sun, Shaofeng Hu, Jieming Shi, Hui Li

TL;DR

Crowdsourcing fraud in MMMAs presents challenges due to data heterogeneity, dynamics, and limited supervision. The authors propose CMT, a Contrastive Multi-view Learning framework on Heterogeneous Temporal Graphs that integrates HG-Encoder for heterogeneity, dual history views (Temporal Snapshot and User Relation sequences), data augmentation, and a Transformer-based Contrastive Sequence Encoder to learn robust representations in a self-supervised manner. Pretraining with contrastive and binary objectives plus a downstream detector yields state-of-the-art results on industry-scale WeChat data and transferable gains on FinGraph, while revealing actionable fraud patterns. This approach advances graph anomaly detection by jointly modeling multi-relational structure and temporal evolution under limited labels, with practical implications for large-scale fraud monitoring in MMMAs and beyond.

Abstract

The rise of the click farm business using Multi-purpose Messaging Mobile Apps (MMMAs) tempts cybercriminals to perpetrate crowdsourcing frauds that cause financial losses to click farm workers. In this paper, we propose a novel contrastive multi-view learning method named CMT for crowdsourcing fraud detection over the heterogeneous temporal graph (HTG) of MMMA. CMT captures both heterogeneity and dynamics of HTG and generates high-quality representations for crowdsourcing fraud detection in a self-supervised manner. We deploy CMT to detect crowdsourcing frauds on an industry-size HTG of a representative MMMA WeChat and it significantly outperforms other methods. CMT also shows promising results for fraud detection on a large-scale public financial HTG, indicating that it can be applied in other graph anomaly detection tasks. We provide our implementation at https://github.com/KDEGroup/CMT.

Crowdsourcing Fraud Detection over Heterogeneous Temporal MMMA Graph

TL;DR

Crowdsourcing fraud in MMMAs presents challenges due to data heterogeneity, dynamics, and limited supervision. The authors propose CMT, a Contrastive Multi-view Learning framework on Heterogeneous Temporal Graphs that integrates HG-Encoder for heterogeneity, dual history views (Temporal Snapshot and User Relation sequences), data augmentation, and a Transformer-based Contrastive Sequence Encoder to learn robust representations in a self-supervised manner. Pretraining with contrastive and binary objectives plus a downstream detector yields state-of-the-art results on industry-scale WeChat data and transferable gains on FinGraph, while revealing actionable fraud patterns. This approach advances graph anomaly detection by jointly modeling multi-relational structure and temporal evolution under limited labels, with practical implications for large-scale fraud monitoring in MMMAs and beyond.

Abstract

The rise of the click farm business using Multi-purpose Messaging Mobile Apps (MMMAs) tempts cybercriminals to perpetrate crowdsourcing frauds that cause financial losses to click farm workers. In this paper, we propose a novel contrastive multi-view learning method named CMT for crowdsourcing fraud detection over the heterogeneous temporal graph (HTG) of MMMA. CMT captures both heterogeneity and dynamics of HTG and generates high-quality representations for crowdsourcing fraud detection in a self-supervised manner. We deploy CMT to detect crowdsourcing frauds on an industry-size HTG of a representative MMMA WeChat and it significantly outperforms other methods. CMT also shows promising results for fraud detection on a large-scale public financial HTG, indicating that it can be applied in other graph anomaly detection tasks. We provide our implementation at https://github.com/KDEGroup/CMT.
Paper Structure (27 sections, 8 equations, 10 figures, 5 tables)

This paper contains 27 sections, 8 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Crowdsourcing fraud in WeChat.
  • Figure 2: Overview of CMT.
  • Figure 3: Different users' behavioral sequences.
  • Figure 4: Data augmentation in CMT.
  • Figure 5: An example for substitution.
  • ...and 5 more figures

Theorems & Definitions (1)

  • Definition 3.1: HTG