Segmentation over Complexity: Evaluating Ensemble and Hybrid Approaches for Anomaly Detection in Industrial Time Series
Emilio Mastriani, Alessandro Costa, Federico Incardona, Kevin Munari, Sebastiano Spinello
TL;DR
This work tackles anomaly detection in a high-dimensional industrial time series from a steam turbine, where temporal labeling is uncertain and anomalies are imbalanced. It benchmarks a simple segmentation-based Random Forest + XGBoost ensemble against more complex feature engineering (change point statistics) and hybrid architectures (PCA-based and SVM-based combos). The key finding is that the simple RF+XGBoost ensemble on segmented data achieves the best overall performance (AUC-ROC ~ 0.976, F1 ~ 0.41), while advanced features and hybrids often degrade performance due to noise, overfitting, or overlapping biases. The results advocate for model simplicity paired with domain-informed segmentation, offering robustness, interpretability, and practical applicability in industrial anomaly detection, and suggest focusing on segmentation quality rather than architectural complexity in similar settings.
Abstract
In this study, we investigate the effectiveness of advanced feature engineering and hybrid model architectures for anomaly detection in a multivariate industrial time series, focusing on a steam turbine system. We evaluate the impact of change point-derived statistical features, clustering-based substructure representations, and hybrid learning strategies on detection performance. Despite their theoretical appeal, these complex approaches consistently underperformed compared to a simple Random Forest + XGBoost ensemble trained on segmented data. The ensemble achieved an AUC-ROC of 0.976, F1-score of 0.41, and 100% early detection within the defined time window. Our findings highlight that, in scenarios with highly imbalanced and temporally uncertain data, model simplicity combined with optimized segmentation can outperform more sophisticated architectures, offering greater robustness, interpretability, and operational utility.
