Table of Contents
Fetching ...

Graph Feature Preprocessor: Real-time Subgraph-based Feature Extraction for Financial Crime Detection

Jovan Blanuša, Maximo Cravero Baraja, Andreea Anghel, Luc von Niederhäusern, Erik Altman, Haris Pozidis, Kubilay Atasu

TL;DR

The Graph Feature Preprocessor solution, which combines the Graph Feature Preprocessor and gradient-boosting-based machine learning models, can detect illicit transactions with higher minority-class F1 scores than standard graph neural networks in anti-money laundering and phishing datasets.

Abstract

In this paper, we present "Graph Feature Preprocessor", a software library for detecting typical money laundering patterns in financial transaction graphs in real time. These patterns are used to produce a rich set of transaction features for downstream machine learning training and inference tasks such as detection of fraudulent financial transactions. We show that our enriched transaction features dramatically improve the prediction accuracy of gradient-boosting-based machine learning models. Our library exploits multicore parallelism, maintains a dynamic in-memory graph, and efficiently mines subgraph patterns in the incoming transaction stream, which enables it to be operated in a streaming manner. Our solution, which combines our Graph Feature Preprocessor and gradient-boosting-based machine learning models, can detect illicit transactions with higher minority-class F1 scores than standard graph neural networks in anti-money laundering and phishing datasets. In addition, the end-to-end throughput rate of our solution executed on a multicore CPU outperforms the graph neural network baselines executed on a powerful V100 GPU. Overall, the combination of high accuracy, a high throughput rate, and low latency of our solution demonstrates the practical value of our library in real-world applications.

Graph Feature Preprocessor: Real-time Subgraph-based Feature Extraction for Financial Crime Detection

TL;DR

The Graph Feature Preprocessor solution, which combines the Graph Feature Preprocessor and gradient-boosting-based machine learning models, can detect illicit transactions with higher minority-class F1 scores than standard graph neural networks in anti-money laundering and phishing datasets.

Abstract

In this paper, we present "Graph Feature Preprocessor", a software library for detecting typical money laundering patterns in financial transaction graphs in real time. These patterns are used to produce a rich set of transaction features for downstream machine learning training and inference tasks such as detection of fraudulent financial transactions. We show that our enriched transaction features dramatically improve the prediction accuracy of gradient-boosting-based machine learning models. Our library exploits multicore parallelism, maintains a dynamic in-memory graph, and efficiently mines subgraph patterns in the incoming transaction stream, which enables it to be operated in a streaming manner. Our solution, which combines our Graph Feature Preprocessor and gradient-boosting-based machine learning models, can detect illicit transactions with higher minority-class F1 scores than standard graph neural networks in anti-money laundering and phishing datasets. In addition, the end-to-end throughput rate of our solution executed on a multicore CPU outperforms the graph neural network baselines executed on a powerful V100 GPU. Overall, the combination of high accuracy, a high throughput rate, and low latency of our solution demonstrates the practical value of our library in real-world applications.
Paper Structure (9 sections, 10 figures, 5 tables, 1 algorithm)

This paper contains 9 sections, 10 figures, 5 tables, 1 algorithm.

Figures (10)

  • Figure 1: Crime patterns in financial transaction graphs.
  • Figure 2: The overview of our graph ML pipeline for the detection of suspicious financial transactions.
  • Figure 3: Our Graph Feature Preprocessor is offered as a scikit-learn preprocessor with fit and transform methods.
  • Figure 4: Fine-grained parallelism exploited by GFP. The library searches for cycles independently for each input transaction by recursively exploring the transaction graph. The coarse-grained approach would use only four threads, while the fine-grained approach uses eleven threads.
  • Figure 5: Enumeration of scatter-gather patterns that contain the edge $u \rightarrow v$ with $v$ being an intermediate vertex.
  • ...and 5 more figures