Locally Differentially Private Embedding Models in Distributed Fraud Prevention Systems
Iker Perez, Jason Wong, Piotr Skalski, Stuart Burrell, Richard Mortier, Derek McAuley, David Sutton
TL;DR
The paper tackles the challenge of privacy-preserving collaborative fraud prevention across financial institutions by publishing locally differentially private embeddings of transaction histories to inform externally hosted models. It introduces a distributed learning framework with two operation modes (peer-to-peer and orchestrated), leveraging latent transaction embeddings and additive noise to balance utility and privacy under the $oldsymbol{\epsilon}$-LDP paradigm. The authors formalize a data publication mechanism, analyze three attack classes (embedding inversion, attribute inference, and membership), and demonstrate that utility remains close to centralized baselines for both synthetic SWIFT data and a real-world acquiring dataset across a range of privacy budgets. The work shows practical potential for privacy-preserving collaboration in finance, providing quantitative evidence of favorable utility-privacy trade-offs and outlining scalable training strategies for distributed fraud and anomaly detection systems.
Abstract
Global financial crime activity is driving demand for machine learning solutions in fraud prevention. However, prevention systems are commonly serviced to financial institutions in isolation, and few provisions exist for data sharing due to fears of unintentional leaks and adversarial attacks. Collaborative learning advances in finance are rare, and it is hard to find real-world insights derived from privacy-preserving data processing systems. In this paper, we present a collaborative deep learning framework for fraud prevention, designed from a privacy standpoint, and awarded at the recent PETs Prize Challenges. We leverage latent embedded representations of varied-length transaction sequences, along with local differential privacy, in order to construct a data release mechanism which can securely inform externally hosted fraud and anomaly detection models. We assess our contribution on two distributed data sets donated by large payment networks, and demonstrate robustness to popular inference-time attacks, along with utility-privacy trade-offs analogous to published work in alternative application domains.
