VecAug: Unveiling Camouflaged Frauds with Cohort Augmentation for Enhanced Detection

Fei Xiao; Shaofeng Cai; Gang Chen; H. V. Jagadish; Beng Chin Ooi; Meihui Zhang

VecAug: Unveiling Camouflaged Frauds with Cohort Augmentation for Enhanced Detection

Fei Xiao, Shaofeng Cai, Gang Chen, H. V. Jagadish, Beng Chin Ooi, Meihui Zhang

TL;DR

VecAug addresses the challenge of camouflaged fraud in e-commerce by leveraging personalized cohorts learned from user behavioral sequences. It introduces a four-phase framework—vector burn-in, cohort identification, augmented training, and prediction—with an attentive cohort aggregator and a label-aware negative separation to robustly augment target-user representations. The method optimizes a joint objective combining main prediction, supervised contrastive learning, and latent-space alignment, and it integrates with existing models and vector databases to scale in practice. Across three datasets, VecAug yields consistent improvements in AUC and recall at high precision, and demonstrates strong robustness, efficiency, and the ability to unveil camouflaged fraud when paired with graph-based detectors.

Abstract

Fraud detection presents a challenging task characterized by ever-evolving fraud patterns and scarce labeled data. Existing methods predominantly rely on graph-based or sequence-based approaches. While graph-based approaches connect users through shared entities to capture structural information, they remain vulnerable to fraudsters who can disrupt or manipulate these connections. In contrast, sequence-based approaches analyze users' behavioral patterns, offering robustness against tampering but overlooking the interactions between similar users. Inspired by cohort analysis in retention and healthcare, this paper introduces VecAug, a novel cohort-augmented learning framework that addresses these challenges by enhancing the representation learning of target users with personalized cohort information. To this end, we first propose a vector burn-in technique for automatic cohort identification, which retrieves a task-specific cohort for each target user. Then, to fully exploit the cohort information, we introduce an attentive cohort aggregation technique for augmenting target user representations. To improve the robustness of such cohort augmentation, we also propose a novel label-aware cohort neighbor separation mechanism to distance negative cohort neighbors and calibrate the aggregated cohort information. By integrating this cohort information with target user representations, VecAug enhances the modeling capacity and generalization capabilities of the model to be augmented. Our framework is flexible and can be seamlessly integrated with existing fraud detection models. We deploy our framework on e-commerce platforms and evaluate it on three fraud detection datasets, and results show that VecAug improves the detection performance of base models by up to 2.48\% in AUC and 22.5\% in R@P$_{0.9}$, outperforming state-of-the-art methods significantly.

VecAug: Unveiling Camouflaged Frauds with Cohort Augmentation for Enhanced Detection

TL;DR

Abstract

, outperforming state-of-the-art methods significantly.

Paper Structure (32 sections, 9 equations, 9 figures, 7 tables, 2 algorithms)

This paper contains 32 sections, 9 equations, 9 figures, 7 tables, 2 algorithms.

Introduction
Problem Definition
Fraud Detection
Cohort Augmentation for Fraud Detection
Methodologies
Overall Framework
Vector Burn-in Phase
Cohort Identification Phase
Cohort-Augmented Training Phase
Prediction Phase
Multi-Objective Joint Training
Experiments
Experimental Setup
Datasets.
Baseline Methods
...and 17 more sections

Figures (9)

Figure 1: A toy example depicting the behavioral sequences of three users: User B and User C have similar behavior sequences, whereas User A has quite different ones.
Figure 2: The proposed learning-based cohort augmentation framework VecAug. VecAug seamlessly integrates with any existing fraud detection models to uncover hidden connections between users and utilize cohort information for enhanced fraud detection.
Figure 3: The detailed illustration of cohort augmentation module and prediction module. The neighbor transformation blocks are identical. The dashed arrows denote computational flows involved only in the augmented training phase.
Figure 4: The augmentation neighbors and negative neighbors for the target sample $u_i$ in vector burn-in space and augmented representation space.
Figure 5: Performance comparison of VecAug and its two variants in fraud detection tasks.
...and 4 more figures

Theorems & Definitions (3)

Definition 1
Definition 2
Definition 3

VecAug: Unveiling Camouflaged Frauds with Cohort Augmentation for Enhanced Detection

TL;DR

Abstract

VecAug: Unveiling Camouflaged Frauds with Cohort Augmentation for Enhanced Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (9)

Theorems & Definitions (3)