Distributed Monitoring for Data Distribution Shifts in Edge-ML Fraud Detection

Nader Karayanni; Robert J. Shahla; Chieh-Lien Hsiao

Distributed Monitoring for Data Distribution Shifts in Edge-ML Fraud Detection

Nader Karayanni, Robert J. Shahla, Chieh-Lien Hsiao

TL;DR

The paper tackles data distribution shift in edge-ML fraud detection by proposing an open-source framework that continuously monitors drift across a network of edge devices using a distributed Kolmogorov-Smirnov statistic computed from per-edge $t$-digests. A Python-based client-server architecture enables compact, mergeable representations of local distributions and serverless backend aggregation, minimizing bandwidth while delivering accurate KS estimates $KS(F_1,F_2)=\sup_x|F_1(x)-F_2(x)|$. Extensive experiments on real-world and synthetic financial datasets demonstrate that the distributed approach (T-Digest-KS) closely matches the fully centralized (Optimal-KS) KS with median errors below $0.004$ and maintain low false-positive/false-negative rates, while offering scalable backend performance and reduced client overhead. The work advances practical, privacy-conscious monitoring for edge ML fraud systems and lays groundwork for future privacy-preserving and sliding-window extensions in holistic edge monitoring frameworks.

Abstract

The digital era has seen a marked increase in financial fraud. edge ML emerged as a promising solution for smartphone payment services fraud detection, enabling the deployment of ML models directly on edge devices. This approach enables a more personalized real-time fraud detection. However, a significant gap in current research is the lack of a robust system for monitoring data distribution shifts in these distributed edge ML applications. Our work bridges this gap by introducing a novel open-source framework designed for continuous monitoring of data distribution shifts on a network of edge devices. Our system includes an innovative calculation of the Kolmogorov-Smirnov (KS) test over a distributed network of edge devices, enabling efficient and accurate monitoring of users behavior shifts. We comprehensively evaluate the proposed framework employing both real-world and synthetic financial transaction datasets and demonstrate the framework's effectiveness.

Distributed Monitoring for Data Distribution Shifts in Edge-ML Fraud Detection

TL;DR

-digests. A Python-based client-server architecture enables compact, mergeable representations of local distributions and serverless backend aggregation, minimizing bandwidth while delivering accurate KS estimates

. Extensive experiments on real-world and synthetic financial datasets demonstrate that the distributed approach (T-Digest-KS) closely matches the fully centralized (Optimal-KS) KS with median errors below

and maintain low false-positive/false-negative rates, while offering scalable backend performance and reduced client overhead. The work advances practical, privacy-conscious monitoring for edge ML fraud systems and lays groundwork for future privacy-preserving and sliding-window extensions in holistic edge monitoring frameworks.

Abstract

Paper Structure (22 sections, 1 equation, 7 figures)

This paper contains 22 sections, 1 equation, 7 figures.

Introduction
Background and Related Work
Edge ML for Fraud Detection
Handling Data Distribution Shift
Two Distributions Comparison
$t$-digest Data Structure
Proposed Framework
High-level Overview
Framework Implementation
Client
Backend
Evaluation
Datasets
Accuracy
Real-World Dataset Evaluation
...and 7 more sections

Figures (7)

Figure 1: KS-statistic visualization enwiki:ks.
Figure 2: Framework overview
Figure 3: Handling queue message.
Figure 4: Real-world dataset accuracy with different shifts.
Figure 5: Accuracy with a percentage of shifted users.
...and 2 more figures

Distributed Monitoring for Data Distribution Shifts in Edge-ML Fraud Detection

TL;DR

Abstract

Distributed Monitoring for Data Distribution Shifts in Edge-ML Fraud Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (7)