Locally Differentially Private Embedding Models in Distributed Fraud Prevention Systems

Iker Perez; Jason Wong; Piotr Skalski; Stuart Burrell; Richard Mortier; Derek McAuley; David Sutton

Locally Differentially Private Embedding Models in Distributed Fraud Prevention Systems

Iker Perez, Jason Wong, Piotr Skalski, Stuart Burrell, Richard Mortier, Derek McAuley, David Sutton

TL;DR

The paper tackles the challenge of privacy-preserving collaborative fraud prevention across financial institutions by publishing locally differentially private embeddings of transaction histories to inform externally hosted models. It introduces a distributed learning framework with two operation modes (peer-to-peer and orchestrated), leveraging latent transaction embeddings and additive noise to balance utility and privacy under the $oldsymbol{\epsilon}$-LDP paradigm. The authors formalize a data publication mechanism, analyze three attack classes (embedding inversion, attribute inference, and membership), and demonstrate that utility remains close to centralized baselines for both synthetic SWIFT data and a real-world acquiring dataset across a range of privacy budgets. The work shows practical potential for privacy-preserving collaboration in finance, providing quantitative evidence of favorable utility-privacy trade-offs and outlining scalable training strategies for distributed fraud and anomaly detection systems.

Abstract

Global financial crime activity is driving demand for machine learning solutions in fraud prevention. However, prevention systems are commonly serviced to financial institutions in isolation, and few provisions exist for data sharing due to fears of unintentional leaks and adversarial attacks. Collaborative learning advances in finance are rare, and it is hard to find real-world insights derived from privacy-preserving data processing systems. In this paper, we present a collaborative deep learning framework for fraud prevention, designed from a privacy standpoint, and awarded at the recent PETs Prize Challenges. We leverage latent embedded representations of varied-length transaction sequences, along with local differential privacy, in order to construct a data release mechanism which can securely inform externally hosted fraud and anomaly detection models. We assess our contribution on two distributed data sets donated by large payment networks, and demonstrate robustness to popular inference-time attacks, along with utility-privacy trade-offs analogous to published work in alternative application domains.

Locally Differentially Private Embedding Models in Distributed Fraud Prevention Systems

TL;DR

-LDP paradigm. The authors formalize a data publication mechanism, analyze three attack classes (embedding inversion, attribute inference, and membership), and demonstrate that utility remains close to centralized baselines for both synthetic SWIFT data and a real-world acquiring dataset across a range of privacy budgets. The work shows practical potential for privacy-preserving collaboration in finance, providing quantitative evidence of favorable utility-privacy trade-offs and outlining scalable training strategies for distributed fraud and anomaly detection systems.

Abstract

Paper Structure (20 sections, 18 equations, 5 figures, 3 tables, 2 algorithms)

This paper contains 20 sections, 18 equations, 5 figures, 3 tables, 2 algorithms.

Introduction
Fraud Detection in Financial Transactions and Payments
Decomposing state-of-the-art classifiers
Privacy-preserving collaborative fraud prevention
Embedding Models and Privacy
$\epsilon$-Local Differential Privacy
Additive Noise Mechanisms
Private Embedding Models for Distributed Anomaly and Fraud Detection
Transfer Learning with Pre-Trained Embeddings
End-to-End Training in Orchestrated Systems
Threat Models and Privacy Attacks
Embedding Inversion Attack
Attribute Inference Attack
Membership Attack
Experimental Results
...and 5 more sections

Figures (5)

Figure 1: Schematic of a modern fraud classifier. To score a transaction $\textbf{x}_{t_6}$ at time $t_6$, historic data for ordering and beneficiary accounts is ingested, formatted and independently aggregated over time. The resulting profiles $\textbf{z}^\text{o}_{t_6}$ and $\textbf{z}^\text{b}_{t_6}$ are processed by a binary classifier, producing a score.
Figure 2: Dual encoding procedure for transaction sequences. Forward and backward embeddings are combined into a contrastive loss function.
Figure 3: Diagram of the PETs Prize Challenges setting. To explore anomaly detection, SWIFT transaction data may be augmented by requesting personal account information to banking institutions.
Figure 4: Diagram of collaborative setting. Card holders transact with merchants managed by the acquirer, which accepts responsibility to assess fraud risk. Issuing institutions agree to share privacy-preserving insights for card holders.
Figure 5: Fraud detection performance metrics and privacy attack success rates for a large acquiring institution, evaluated on a hold-out period.

Theorems & Definitions (2)

Definition 1: $\epsilon$-differential privacy
Definition 2: $\epsilon$-local differential privacy

Locally Differentially Private Embedding Models in Distributed Fraud Prevention Systems

TL;DR

Abstract

Locally Differentially Private Embedding Models in Distributed Fraud Prevention Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (5)

Theorems & Definitions (2)