Accurate and fast anomaly detection in industrial processes and IoT environments
Simone Tonini, Andrea Vandin, Francesca Chiaromonte, Daniele Licari, Fernando Barsacchi
TL;DR
This paper tackles anomaly detection in industrial IoT time series characterized by multicollinearity and unknown distributions. It introduces SAnD, a five-step semi-supervised pipeline that combines smoothing ($h$), variance inflation-factor-based multicollinearity removal, the Mahalanobis distance ($MD$), EVT-based thresholding (MVT/POT), and supervised feature-importance analysis to locate anomalies and infer potential causes. Empirical results across eight public datasets and a real case study demonstrate that SAnD outperforms nine state-of-the-art semi-supervised methods in both detection accuracy and runtime, while providing interpretable explanations of anomalies. The approach is simple, broadly applicable to various industrial domains, and comes with replicability materials to support deployment and further research.
Abstract
We present a novel, simple and widely applicable semi-supervised procedure for anomaly detection in industrial and IoT environments, SAnD (Simple Anomaly Detection). SAnD comprises 5 steps, each leveraging well-known statistical tools, namely; smoothing filters, variance inflation factors, the Mahalanobis distance, threshold selection algorithms and feature importance techniques. To our knowledge, SAnD is the first procedure that integrates these tools to identify anomalies and help decipher their putative causes. We show how each step contributes to tackling technical challenges that practitioners face when detecting anomalies in industrial contexts, where signals can be highly multicollinear, have unknown distributions, and intertwine short-lived noise with the long(er)-lived actual anomalies. The development of SAnD was motivated by a concrete case study from our industrial partner, which we use here to show its effectiveness. We also evaluate the performance of SAnD by comparing it with a selection of semi-supervised methods on public datasets from the literature on anomaly detection. We conclude that SAnD is effective, broadly applicable, and outperforms existing approaches in both anomaly detection and runtime.
