Table of Contents
Fetching ...

Fed-RD: Privacy-Preserving Federated Learning for Financial Crime Detection

Md. Saikat Islam Khan, Aparna Gupta, Oshani Seneviratne, Stacy Patterson

TL;DR

Fed-RD tackles end-to-end privacy in federated learning for relational financial data partitioned both vertically and horizontally.It introduces two fusion approaches: Approach 1 uses Local Rényi differential privacy with Gaussian noise on embeddings, while Approach 2 uses PBM and MPC to compute a private embedding sum; both yield formal $(\alpha, \epsilon)$-RDP guarantees over $Q$ iterations.The method models data with three components—a transaction model with parameters $\boldsymbol{\theta}_T$, an account model with parameters $\boldsymbol{\theta}_B$, and a fusion model with parameters $\boldsymbol{\theta}_F$—and optimizes the loss $\mathcal{L}(\boldsymbol{\\Theta}; \mathcal{X}_T, \mathcal{X}_B, \mathbf{y})$.Experiments on SWIFT-like and AMLSim synthetic datasets show that privacy-preserving Fed-RD attains high predictive performance while maintaining formal privacy guarantees and outperforming a baseline XGBoost model on transaction features.

Abstract

We introduce Federated Learning for Relational Data (Fed-RD), a novel privacy-preserving federated learning algorithm specifically developed for financial transaction datasets partitioned vertically and horizontally across parties. Fed-RD strategically employs differential privacy and secure multiparty computation to guarantee the privacy of training data. We provide theoretical analysis of the end-to-end privacy of the training algorithm and present experimental results on realistic synthetic datasets. Our results demonstrate that Fed-RD achieves high model accuracy with minimal degradation as privacy increases, while consistently surpassing benchmark results.

Fed-RD: Privacy-Preserving Federated Learning for Financial Crime Detection

TL;DR

Fed-RD tackles end-to-end privacy in federated learning for relational financial data partitioned both vertically and horizontally.It introduces two fusion approaches: Approach 1 uses Local Rényi differential privacy with Gaussian noise on embeddings, while Approach 2 uses PBM and MPC to compute a private embedding sum; both yield formal $(\alpha, \epsilon)$-RDP guarantees over $Q$ iterations.The method models data with three components—a transaction model with parameters $\boldsymbol{\theta}_T$, an account model with parameters $\boldsymbol{\theta}_B$, and a fusion model with parameters $\boldsymbol{\theta}_F$—and optimizes the loss $\mathcal{L}(\boldsymbol{\\Theta}; \mathcal{X}_T, \mathcal{X}_B, \mathbf{y})$.Experiments on SWIFT-like and AMLSim synthetic datasets show that privacy-preserving Fed-RD attains high predictive performance while maintaining formal privacy guarantees and outperforming a baseline XGBoost model on transaction features.

Abstract

We introduce Federated Learning for Relational Data (Fed-RD), a novel privacy-preserving federated learning algorithm specifically developed for financial transaction datasets partitioned vertically and horizontally across parties. Fed-RD strategically employs differential privacy and secure multiparty computation to guarantee the privacy of training data. We provide theoretical analysis of the end-to-end privacy of the training algorithm and present experimental results on realistic synthetic datasets. Our results demonstrate that Fed-RD achieves high model accuracy with minimal degradation as privacy increases, while consistently surpassing benchmark results.
Paper Structure (29 sections, 2 theorems, 12 equations, 4 figures, 3 tables, 2 algorithms)

This paper contains 29 sections, 2 theorems, 12 equations, 4 figures, 3 tables, 2 algorithms.

Key Result

Theorem 5.1

Let $\beta' \in [0, \frac{1}{4}]$, $b' \in \mathbb{N}$, and $\alpha \leq 2$. Algorithm vflmpc, after $Q$ iterations, provides $(\alpha, \epsilon_T(\alpha))$-RDP for transactions features with and provides $(\alpha, \epsilon_B(\alpha))$-RDP for bank account features with where $P$ is the embedding size, $B$ is the transaction batch size, $M_T$ is the maximum number of transactions in which a sing

Figures (4)

  • Figure 1: Illustration of data and model partitioning among the transaction party and banks.
  • Figure 2: Testing AUPRC on the SWIFT dataset for various values of $\beta$, with $b = 64$ and $b' = 1024$.
  • Figure 3: Testing AUPRC on the AMLSim dataset for various values of $\beta$, with $b = 64$ and $b' = 1024$.
  • Figure 4: Comparison between Adam and SGD optimizer.

Theorems & Definitions (5)

  • Definition 2.1: Rényi divergence
  • Definition 2.2: Rényi Differential Privacy (RDP)
  • Definition 2.3: Local Rényi Differential Privacy (Local RDP)
  • Theorem 5.1
  • Theorem 5.2