On the Potential of Network-Based Features for Fraud Detection
Catayoun Azarm, Erman Acar, Mickey van Zeelt
TL;DR
The paper tackles online transaction fraud detection by incorporating network-based features, specifically a personalised PageRank exposure score, into a logistic regression framework. It builds a directed weighted transaction graph from ING Netherlands data and demonstrates that adding the $ppr$ feature improves the AUC by about 2 percentage points over a baseline of six traditional features, with high feature importance for $ppr$ and channel-related attributes. Interpretability analyses and PSI stability checks indicate reliable, generalisable signals, while acknowledging limitations tied to a single network source. The study suggests future work on integrating multiple financial networks and exploring graph embeddings to broaden applicability and robustness in fraud detection.
Abstract
Online transaction fraud presents substantial challenges to businesses and consumers, risking significant financial losses. Conventional rule-based systems struggle to keep pace with evolving fraud tactics, leading to high false positive rates and missed detections. Machine learning techniques offer a promising solution by leveraging historical data to identify fraudulent patterns. This article explores using the personalised PageRank (PPR) algorithm to capture the social dynamics of fraud by analysing relationships between financial accounts. The primary objective is to compare the performance of traditional features with the addition of PPR in fraud detection models. Results indicate that integrating PPR enhances the model's predictive power, surpassing the baseline model. Additionally, the PPR feature provides unique and valuable information, evidenced by its high feature importance score. Feature stability analysis confirms consistent feature distributions across training and test datasets.
