VecAug: Unveiling Camouflaged Frauds with Cohort Augmentation for Enhanced Detection
Fei Xiao, Shaofeng Cai, Gang Chen, H. V. Jagadish, Beng Chin Ooi, Meihui Zhang
TL;DR
VecAug addresses the challenge of camouflaged fraud in e-commerce by leveraging personalized cohorts learned from user behavioral sequences. It introduces a four-phase framework—vector burn-in, cohort identification, augmented training, and prediction—with an attentive cohort aggregator and a label-aware negative separation to robustly augment target-user representations. The method optimizes a joint objective combining main prediction, supervised contrastive learning, and latent-space alignment, and it integrates with existing models and vector databases to scale in practice. Across three datasets, VecAug yields consistent improvements in AUC and recall at high precision, and demonstrates strong robustness, efficiency, and the ability to unveil camouflaged fraud when paired with graph-based detectors.
Abstract
Fraud detection presents a challenging task characterized by ever-evolving fraud patterns and scarce labeled data. Existing methods predominantly rely on graph-based or sequence-based approaches. While graph-based approaches connect users through shared entities to capture structural information, they remain vulnerable to fraudsters who can disrupt or manipulate these connections. In contrast, sequence-based approaches analyze users' behavioral patterns, offering robustness against tampering but overlooking the interactions between similar users. Inspired by cohort analysis in retention and healthcare, this paper introduces VecAug, a novel cohort-augmented learning framework that addresses these challenges by enhancing the representation learning of target users with personalized cohort information. To this end, we first propose a vector burn-in technique for automatic cohort identification, which retrieves a task-specific cohort for each target user. Then, to fully exploit the cohort information, we introduce an attentive cohort aggregation technique for augmenting target user representations. To improve the robustness of such cohort augmentation, we also propose a novel label-aware cohort neighbor separation mechanism to distance negative cohort neighbors and calibrate the aggregated cohort information. By integrating this cohort information with target user representations, VecAug enhances the modeling capacity and generalization capabilities of the model to be augmented. Our framework is flexible and can be seamlessly integrated with existing fraud detection models. We deploy our framework on e-commerce platforms and evaluate it on three fraud detection datasets, and results show that VecAug improves the detection performance of base models by up to 2.48\% in AUC and 22.5\% in R@P$_{0.9}$, outperforming state-of-the-art methods significantly.
