Table of Contents
Fetching ...

FRAUDGUESS: Spotting and Explaining New Types of Fraud in Million-Scale Financial Data

Robson L. F. Cordeiro, Meng-Chieh Lee, Christos Faloutsos

TL;DR

FraudGuess addresses the challenge of spotting novel fraud types in million-scale financial data while providing interpretable justification for analysts. It decomposes the problem into Detection (FraudGuess-D) that identifies micro-cluster lockstep behaviors using a curated feature set and heatmaps, and Justification (FraudGuess-J) that delivers an interactive dashboard and visual explanations. In real AFI data, FraudGuess identifies three new fraudulent patterns, with two confirmed by domain experts, and demonstrates scalable linear-time complexity suitable for large deployments. The approach emphasizes explainability over black-box models and outlines plans for deployment and reproducibility through open-source code and synthetic data.

Abstract

Given a set of financial transactions (who buys from whom, when, and for how much), as well as prior information from buyers and sellers, how can we find fraudulent transactions? If we have labels for some transactions for known types of fraud, we can build a classifier. However, we also want to find new types of fraud, still unknown to the domain experts ('Detection'). Moreover, we also want to provide evidence to experts that supports our opinion ('Justification'). In this paper, we propose FRAUDGUESS, to achieve two goals: (a) for 'Detection', it spots new types of fraud as micro-clusters in a carefully designed feature space; (b) for 'Justification', it uses visualization and heatmaps for evidence, as well as an interactive dashboard for deep dives. FRAUDGUESS is used in real life and is currently considered for deployment in an Anonymous Financial Institution (AFI). Thus, we also present the three new behaviors that FRAUDGUESS discovered in a real, million-scale financial dataset. Two of these behaviors are deemed fraudulent or suspicious by domain experts, catching hundreds of fraudulent transactions that would otherwise go un-noticed.

FRAUDGUESS: Spotting and Explaining New Types of Fraud in Million-Scale Financial Data

TL;DR

FraudGuess addresses the challenge of spotting novel fraud types in million-scale financial data while providing interpretable justification for analysts. It decomposes the problem into Detection (FraudGuess-D) that identifies micro-cluster lockstep behaviors using a curated feature set and heatmaps, and Justification (FraudGuess-J) that delivers an interactive dashboard and visual explanations. In real AFI data, FraudGuess identifies three new fraudulent patterns, with two confirmed by domain experts, and demonstrates scalable linear-time complexity suitable for large deployments. The approach emphasizes explainability over black-box models and outlines plans for deployment and reproducibility through open-source code and synthetic data.

Abstract

Given a set of financial transactions (who buys from whom, when, and for how much), as well as prior information from buyers and sellers, how can we find fraudulent transactions? If we have labels for some transactions for known types of fraud, we can build a classifier. However, we also want to find new types of fraud, still unknown to the domain experts ('Detection'). Moreover, we also want to provide evidence to experts that supports our opinion ('Justification'). In this paper, we propose FRAUDGUESS, to achieve two goals: (a) for 'Detection', it spots new types of fraud as micro-clusters in a carefully designed feature space; (b) for 'Justification', it uses visualization and heatmaps for evidence, as well as an interactive dashboard for deep dives. FRAUDGUESS is used in real life and is currently considered for deployment in an Anonymous Financial Institution (AFI). Thus, we also present the three new behaviors that FRAUDGUESS discovered in a real, million-scale financial dataset. Two of these behaviors are deemed fraudulent or suspicious by domain experts, catching hundreds of fraudulent transactions that would otherwise go un-noticed.

Paper Structure

This paper contains 28 sections, 2 theorems, 5 figures, 2 tables, 1 algorithm.

Key Result

Lemma 1

FraudGuess-D requires time linear on the input size.

Figures (5)

  • Figure 1: FraudGuess found a NEW, suspicious behavior and justified its decisions. (a) ' Detection' (Goal G1): Our method caught tens of suspicious cards (points inside the red circle) exhibiting the new behavior "Double Machine-gun", with many, synchronized txns of the same amount. (b) ' Justification' (Goal G2): FraudGuess justified its decisions via visual inspection. We showcase the interactive dashboard of the suspicious (and later, confirmed fraudster) card '93522' detected before; note an unusual behavior with $66$ txns/day, often of $\$0.99$, at every $\sim\space3$ min. with a small merchant, and occasionally late at night.
  • Figure 2: Goal G1 -- Detection: 'MG-t': machine-gun behavior over time (e.g., every few seconds) 'MG-$': ditto, over amounts; 'small-$': unusually small, and repetitive, amounts.
  • Figure 3: FraudGuess is scalable: which scales linearly on input size, and takes only 10 minutes for 3.2M transactions.
  • Figure 4: Example of "Penny Hunter" : Notice the high count of small-value transactions, every $\approx$15 seconds.
  • Figure 5: Example of "Bursty Poster": Notice the bursty activity, at a strange time of day (10 PM).

Theorems & Definitions (2)

  • Lemma 1
  • Lemma 2