Predicting machine failures from multivariate time series: an industrial case study
Nicolò Oreste Pinciroli Vago, Francesca Forbicini, Piero Fraternali
TL;DR
This study compares non-neural ML and DL approaches for predicting industrial machine failures from multivariate time series, explicitly varying reading window ($RW$) and prediction window ($PW$ to assess forecast horizon effects on performance, measured by macro $F_1$. It analyzes three diverse datasets (Wrapping machine, Blood refrigerator, Nitrogen generator) with distinct temporal patterns, using LR, RF, SVM, LSTM, ConvLSTM, and Transformer models. The results show that DL methods offer substantial gains for complex, diverse fault precursors (notably the wrapping machine), while simpler, repetitive fault patterns in the other datasets yield comparable performance between ML and DL; in all cases, very long horizons degrade predictive power. The work highlights the practical importance of selecting domain-appropriate $RW$ and $PW$, demonstrates how class imbalance is managed, and provides datasets and code to support reproducibility and extensions in predictive maintenance applications.
Abstract
Non-neural Machine Learning (ML) and Deep Learning (DL) models are often used to predict system failures in the context of industrial maintenance. However, only a few researches jointly assess the effect of varying the amount of past data used to make a prediction and the extension in the future of the forecast. This study evaluates the impact of the size of the reading window and of the prediction window on the performances of models trained to forecast failures in three data sets concerning the operation of (1) an industrial wrapping machine working in discrete sessions, (2) an industrial blood refrigerator working continuously, and (3) a nitrogen generator working continuously. The problem is formulated as a binary classification task that assigns the positive label to the prediction window based on the probability of a failure to occur in such an interval. Six algorithms (logistic regression, random forest, support vector machine, LSTM, ConvLSTM, and Transformers) are compared using multivariate telemetry time series. The results indicate that, in the considered scenarios, the dimension of the prediction windows plays a crucial role and highlight the effectiveness of DL approaches at classifying data with diverse time-dependent patterns preceding a failure and the effectiveness of ML approaches at classifying similar and repetitive patterns preceding a failure.
