Addressing Challenges in Time Series Forecasting: A Comprehensive Comparison of Machine Learning Techniques
Seyedeh Azadeh Fallah Mortezanejad, Ruochen Wang
TL;DR
The paper tackles the challenge of selecting effective time series forecasting methods by systematically comparing a wide range of ML approaches against ARIMA on datasets with complete data, outliers, and missing values. It provides a structured overview of model families (RNNs, tree ensembles, specialized TS models, other NN architectures, and advanced techniques), complemented by concrete guidance on feature engineering and data handling. The study highlights LightGBM as a consistently strong performer across scenarios, while also emphasizing the utility of methods like TFT, N-BEATS, N-HiTS, and WBT in particular data regimes. The findings offer practical implications for practitioners managing TS data with nonstationarity, outliers, or incomplete observations, underscoring the value of tailored preprocessing and model selection. This work thus informs both methodological choices and real-world deployment of TS forecasting systems.
Abstract
The explosion of Time Series (TS) data, driven by advancements in technology, necessitates sophisticated analytical methods. Modern management systems increasingly rely on analyzing this data, highlighting the importance of effcient processing techniques. State-of-the-art Machine Learning (ML) approaches for TS analysis and forecasting are becoming prevalent. This paper briefly describes and compiles suitable algorithms for TS regression task. We compare these algorithms against each other and the classic ARIMA method using diverse datasets: complete data, data with outliers, and data with missing values. The focus is on forecasting accuracy, particularly for long-term predictions. This research aids in selecting the most appropriate algorithm based on forecasting needs and data characteristics.
