Fin-Fed-OD: Federated Outlier Detection on Financial Tabular Data
Dayananda Herurkar, Sebastian Palacio, Ahmed Anwar, Joern Hees, Andreas Dengel
TL;DR
Fin-Fed-OD tackles open-world anomaly detection in privacy-sensitive financial tabular data by marrying representation learning with federated learning. The method trains client-specific autoencoders to learn latent representations, which are then used to refine inlier boundaries via OD models while sharing only model parameters through FL aggregation (FedAvg/FedProx). Across tabular and image datasets, including DAGMM and MemAE baselines, FL-based approaches yield stronger detection of unknown outliers without degrading known-outlier performance, supported by both quantitative AP improvements and qualitative latent-space clustering. This privacy-preserving collaboration enables robust, client-discriminating OD with practical implications for financial fraud detection and cross-organization anomaly resilience.
Abstract
Anomaly detection in real-world scenarios poses challenges due to dynamic and often unknown anomaly distributions, requiring robust methods that operate under an open-world assumption. This challenge is exacerbated in practical settings, where models are employed by private organizations, precluding data sharing due to privacy and competitive concerns. Despite potential benefits, the sharing of anomaly information across organizations is restricted. This paper addresses the question of enhancing outlier detection within individual organizations without compromising data confidentiality. We propose a novel method leveraging representation learning and federated learning techniques to improve the detection of unknown anomalies. Specifically, our approach utilizes latent representations obtained from client-owned autoencoders to refine the decision boundary of inliers. Notably, only model parameters are shared between organizations, preserving data privacy. The efficacy of our proposed method is evaluated on two standard financial tabular datasets and an image dataset for anomaly detection in a distributed setting. The results demonstrate a strong improvement in the classification of unknown outliers during the inference phase for each organization's model.
