Distributed Monitoring for Data Distribution Shifts in Edge-ML Fraud Detection
Nader Karayanni, Robert J. Shahla, Chieh-Lien Hsiao
TL;DR
The paper tackles data distribution shift in edge-ML fraud detection by proposing an open-source framework that continuously monitors drift across a network of edge devices using a distributed Kolmogorov-Smirnov statistic computed from per-edge $t$-digests. A Python-based client-server architecture enables compact, mergeable representations of local distributions and serverless backend aggregation, minimizing bandwidth while delivering accurate KS estimates $KS(F_1,F_2)=\sup_x|F_1(x)-F_2(x)|$. Extensive experiments on real-world and synthetic financial datasets demonstrate that the distributed approach (T-Digest-KS) closely matches the fully centralized (Optimal-KS) KS with median errors below $0.004$ and maintain low false-positive/false-negative rates, while offering scalable backend performance and reduced client overhead. The work advances practical, privacy-conscious monitoring for edge ML fraud systems and lays groundwork for future privacy-preserving and sliding-window extensions in holistic edge monitoring frameworks.
Abstract
The digital era has seen a marked increase in financial fraud. edge ML emerged as a promising solution for smartphone payment services fraud detection, enabling the deployment of ML models directly on edge devices. This approach enables a more personalized real-time fraud detection. However, a significant gap in current research is the lack of a robust system for monitoring data distribution shifts in these distributed edge ML applications. Our work bridges this gap by introducing a novel open-source framework designed for continuous monitoring of data distribution shifts on a network of edge devices. Our system includes an innovative calculation of the Kolmogorov-Smirnov (KS) test over a distributed network of edge devices, enabling efficient and accurate monitoring of users behavior shifts. We comprehensively evaluate the proposed framework employing both real-world and synthetic financial transaction datasets and demonstrate the framework's effectiveness.
