Starlit: Privacy-Preserving Federated Learning to Enhance Financial Fraud Detection

Aydin Abadi; Bradley Doyle; Francesco Gini; Kieron Guinamard; Sasi Kumar Murakonda; Jack Liddell; Paul Mellor; Steven J. Murdoch; Mohammad Naseri; Hector Page; George Theodorakopoulos; Suzanne Weller

Starlit: Privacy-Preserving Federated Learning to Enhance Financial Fraud Detection

Aydin Abadi, Bradley Doyle, Francesco Gini, Kieron Guinamard, Sasi Kumar Murakonda, Jack Liddell, Paul Mellor, Steven J. Murdoch, Mohammad Naseri, Hector Page, George Theodorakopoulos, Suzanne Weller

TL;DR

Starlit presents a scalable privacy-preserving federated learning framework for cross-institutional financial fraud detection, combining Private Set Intersection, Local Differential Privacy, and SecureBoost within the Flower platform to operate on data partitioned both horizontally and vertically. It provides a formal security definition (Celestial) and a simulation-based proof, along with a practical end-to-end implementation that tolerates client dropouts and does not require prior account freezing. The system introduces two novel capabilities: securely identifying discrepancies in shared features across clients and aggregating per-user flags with privacy protections, enabling richer features for anomaly detection. Empirical results on synthetic, large-scale financial datasets demonstrate Starlit's scalability, efficiency, and competitive accuracy, with broader applicability to terrorism mitigation, digital health, and benefit-fraud detection. The work offers a blueprint for privacy-preserving, multi-party collaboration in regulated domains, showing that rigorous security can coexist with practical performance.

Abstract

Federated Learning (FL) is a data-minimization approach enabling collaborative model training across diverse clients with local data, avoiding direct data exchange. However, state-of-the-art FL solutions to identify fraudulent financial transactions exhibit a subset of the following limitations. They (1) lack a formal security definition and proof, (2) assume prior freezing of suspicious customers' accounts by financial institutions (limiting the solutions' adoption), (3) scale poorly, involving either $O(n^2)$ computationally expensive modular exponentiation (where $n$ is the total number of financial institutions) or highly inefficient fully homomorphic encryption, (4) assume the parties have already completed the identity alignment phase, hence excluding it from the implementation, performance evaluation, and security analysis, and (5) struggle to resist clients' dropouts. This work introduces Starlit, a novel scalable privacy-preserving FL mechanism that overcomes these limitations. It has various applications, such as enhancing financial fraud detection, mitigating terrorism, and enhancing digital health. We implemented Starlit and conducted a thorough performance analysis using synthetic data from a key player in global financial transactions. The evaluation indicates Starlit's scalability, efficiency, and accuracy.

Starlit: Privacy-Preserving Federated Learning to Enhance Financial Fraud Detection

TL;DR

Abstract

computationally expensive modular exponentiation (where

is the total number of financial institutions) or highly inefficient fully homomorphic encryption, (4) assume the parties have already completed the identity alignment phase, hence excluding it from the implementation, performance evaluation, and security analysis, and (5) struggle to resist clients' dropouts. This work introduces Starlit, a novel scalable privacy-preserving FL mechanism that overcomes these limitations. It has various applications, such as enhancing financial fraud detection, mitigating terrorism, and enhancing digital health. We implemented Starlit and conducted a thorough performance analysis using synthetic data from a key player in global financial transactions. The evaluation indicates Starlit's scalability, efficiency, and accuracy.

Paper Structure (68 sections, 1 theorem, 20 equations, 7 figures, 4 tables, 1 algorithm)

This paper contains 68 sections, 1 theorem, 20 equations, 7 figures, 4 tables, 1 algorithm.

Introduction
Our Contributions
Primary Goals and Setting
Related Work
Informal Threat Model
Preliminaries
Notations and Assumptions
Private Set Intersection (PSI)
Local Differential Privacy
Randomized Response.
Laplace Noise with Post-Processing.
Mechanisms for Optimal Inference Privacy.
Federated Learning
SecureBoost: A Lossless Vertical Federated Learning Framework.
$\text{Flower}$: A Federated Learning Implementation Platform
...and 53 more sections

Key Result

theorem 1

Let $\mathcal{F}\xspace$ be the functionality defined in Relation equ::func-cel. Moreover, let $\mathcal{L}\xspace_{ 1}(inp), \mathcal{L}\xspace_{ 2}(inp)$, and $\mathcal{L}\xspace_{ i+2}(inp)$ be the leakages defined in Definitions def:fsp-Side-Leakage, def:fc-Side-Leakage, and def:Bank-Side-Leaka

Figures (7)

Figure 1: Outline of parties' interactions in $\textit{Starlit}$.
Figure 2: Bayesian game for a single flag value.
Figure 3: PSI-based method to identify discrepancies.
Figure 4: Plot(\ref{['fig:rr-laplace']}) compares the effect on AUPRC of the model when using RR and Laplace mechanism with post-processing for achieving LDP. Plot(\ref{['fig:game-rr']}) compares the effect on AUPRC when using RR and privacy mechanisms at the same value of $\epsilon$ but with the constraint of 10% less probability of converting 0 to 1 (1 to 0) than what is recommended by RR. In Plot(\ref{['fig:rr-laplace']}), red dotted line: non-private-train, blue dotted line: non-private-test, solid blue line: RR-train, solid orange line: RR-test, solid green line: Laplace-train, and solid red line: Laplace-test. In Plot(\ref{['fig:game-rr']}), red dotted line: non-private-train, blue dotted line: non-private-test, solid blue line: RR-train, solid orange line: RR-test, solid green line: 10% less 0->1 than RR-train, solid red line: 10% less 0->1 than RR-test, solid purple line: 10% less 1->0 than RR-train, and solid brown line: 10% less 1->0 than RR-test.
Figure 5: Comparing the AUPRC of the baseline and different settings of $\textit{Starlit}$. A bar's label for $\textit{Starlit}$ is a concatenation of (1) direct sampling rate, (2) GOSS, (3) tree's depth, and (4) maximum message size.
...and 2 more figures

Theorems & Definitions (7)

Definition 1
definition 1
definition 2: Security of Celestial
definition 3: $\text{Srv}$--Side Leakage
definition 4: $\text{FC}$--Side Leakage
definition 5: $\text{C}\xspace_{ i}$--Side Leakage
theorem 1

Starlit: Privacy-Preserving Federated Learning to Enhance Financial Fraud Detection

TL;DR

Abstract

Starlit: Privacy-Preserving Federated Learning to Enhance Financial Fraud Detection

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (7)