Anomaly Detection in High-Dimensional Bank Account Balances via Robust Methods
Federico Maddanu, Tommaso Proietti, Riccardo Crupi
TL;DR
This work tackles point anomaly detection in high-dimensional bank balance time series by deploying two robust paradigms: distance-based estimators (OGK, MRCD, COM) on trend-removed residuals and forecasting-based models (RobHAR, RobNHAR) that operate directly on robust residuals to avoid high-dimensional covariances. Through simulations and large-scale ISP data, the authors show robust residual approaches are essential when trend and seasonality are present, with forecasting-based methods offering scalable real-time capability; COM and RobAR provide competitive performance at lower computational cost. The study also uses CUSUM-inspired classification to distinguish between additive outliers and level-shift outliers and leverages clustering to refine contaminated series, identifying a small, highly anomalous subset for targeted forecasting. Overall, the methods detect a meaningful fraction of daily transactions as potential anomalies while enabling practical deployment on datasets with up to millions of time series.
Abstract
Detecting point anomalies in bank account balances is essential for financial institutions, as it enables the identification of potential fraud, operational issues, or other irregularities. Robust statistics is useful for flagging outliers and for providing estimates of the data distribution parameters that are not affected by contaminated observations. However, such a strategy is often less efficient and computationally expensive under high dimensional setting. In this paper, we propose and evaluate empirically several robust approaches that may be computationally efficient in medium and high dimensional datasets, with high breakdown points and low computational time. Our application deals with around 2.6 million daily records of anonymous users' bank account balances.
