Table of Contents
Fetching ...

On the Potential of Network-Based Features for Fraud Detection

Catayoun Azarm, Erman Acar, Mickey van Zeelt

TL;DR

The paper tackles online transaction fraud detection by incorporating network-based features, specifically a personalised PageRank exposure score, into a logistic regression framework. It builds a directed weighted transaction graph from ING Netherlands data and demonstrates that adding the $ppr$ feature improves the AUC by about 2 percentage points over a baseline of six traditional features, with high feature importance for $ppr$ and channel-related attributes. Interpretability analyses and PSI stability checks indicate reliable, generalisable signals, while acknowledging limitations tied to a single network source. The study suggests future work on integrating multiple financial networks and exploring graph embeddings to broaden applicability and robustness in fraud detection.

Abstract

Online transaction fraud presents substantial challenges to businesses and consumers, risking significant financial losses. Conventional rule-based systems struggle to keep pace with evolving fraud tactics, leading to high false positive rates and missed detections. Machine learning techniques offer a promising solution by leveraging historical data to identify fraudulent patterns. This article explores using the personalised PageRank (PPR) algorithm to capture the social dynamics of fraud by analysing relationships between financial accounts. The primary objective is to compare the performance of traditional features with the addition of PPR in fraud detection models. Results indicate that integrating PPR enhances the model's predictive power, surpassing the baseline model. Additionally, the PPR feature provides unique and valuable information, evidenced by its high feature importance score. Feature stability analysis confirms consistent feature distributions across training and test datasets.

On the Potential of Network-Based Features for Fraud Detection

TL;DR

The paper tackles online transaction fraud detection by incorporating network-based features, specifically a personalised PageRank exposure score, into a logistic regression framework. It builds a directed weighted transaction graph from ING Netherlands data and demonstrates that adding the feature improves the AUC by about 2 percentage points over a baseline of six traditional features, with high feature importance for and channel-related attributes. Interpretability analyses and PSI stability checks indicate reliable, generalisable signals, while acknowledging limitations tied to a single network source. The study suggests future work on integrating multiple financial networks and exploring graph embeddings to broaden applicability and robustness in fraud detection.

Abstract

Online transaction fraud presents substantial challenges to businesses and consumers, risking significant financial losses. Conventional rule-based systems struggle to keep pace with evolving fraud tactics, leading to high false positive rates and missed detections. Machine learning techniques offer a promising solution by leveraging historical data to identify fraudulent patterns. This article explores using the personalised PageRank (PPR) algorithm to capture the social dynamics of fraud by analysing relationships between financial accounts. The primary objective is to compare the performance of traditional features with the addition of PPR in fraud detection models. Results indicate that integrating PPR enhances the model's predictive power, surpassing the baseline model. Additionally, the PPR feature provides unique and valuable information, evidenced by its high feature importance score. Feature stability analysis confirms consistent feature distributions across training and test datasets.
Paper Structure (28 sections, 1 equation, 3 figures, 5 tables, 2 algorithms)

This paper contains 28 sections, 1 equation, 3 figures, 5 tables, 2 algorithms.

Figures (3)

  • Figure 1: Transaction Status Distribution.
  • Figure 2: LR_base (baseline model built on traditional features) vs LR_ppr (enhanced model built on both traditional and graph feature ppr)
  • Figure 3: Feature Contribution List: orange for graph features v.s. blue for baseline features.