M$^2$AD: Multi-Sensor Multi-System Anomaly Detection through Global Scoring and Calibrated Thresholding
Sarah Alnegheimish, Zelin He, Matthew Reimherr, Akash Chandrayan, Abhinav Pradhan, Luca D'Angelo
TL;DR
M2AD tackles anomaly detection in heterogeneous multivariate time series across multiple systems by forecasting normal behavior with an LSTM, computing per-sensor residuals, and forming a global anomaly score $S_t$ through a Gaussian Mixture Model and Gamma-calibrated aggregation. The approach offers interpretability by identifying top-contributing sensors and provides theoretical guarantees on error quantification and p-value calibration under dependencies. Empirical results on NASA datasets (MSL, SMAP, SMD) show about 21% average improvements over baselines, and a real-world Amazon case study with 130 assets demonstrates practical impact, aided by covariates and robust thresholding. The work delivers a scalable, calibrated, and interpretable framework for industrial multi-sensor anomaly detection, with code and results shared publicly.
Abstract
With the widespread availability of sensor data across industrial and operational systems, we frequently encounter heterogeneous time series from multiple systems. Anomaly detection is crucial for such systems to facilitate predictive maintenance. However, most existing anomaly detection methods are designed for either univariate or single-system multivariate data, making them insufficient for these complex scenarios. To address this, we introduce M$^2$AD, a framework for unsupervised anomaly detection in multivariate time series data from multiple systems. M$^2$AD employs deep models to capture expected behavior under normal conditions, using the residuals as indicators of potential anomalies. These residuals are then aggregated into a global anomaly score through a Gaussian Mixture Model and Gamma calibration. We theoretically demonstrate that this framework can effectively address heterogeneity and dependencies across sensors and systems. Empirically, M$^2$AD outperforms existing methods in extensive evaluations by 21% on average, and its effectiveness is demonstrated on a large-scale real-world case study on 130 assets in Amazon Fulfillment Centers. Our code and results are available at https://github.com/sarahmish/M2AD.
