Comparing statistical and machine learning methods for time series forecasting in data-driven logistics -- A simulation study
Lena Schmid, Moritz Roidl, Markus Pauly
TL;DR
This study benchmarks out-of-the-box forecasting approaches for data-driven logistics by contrasting traditional time-series models ($ARIMA$, $SARIMA$, $TBATS$) with tree-based ML methods (Random Forest, XGBoost) across a broad set of simulated data-generating processes, queueing models, and added complexities such as jumps and noise. Using sliding-window preprocessing and a fixed set of hyperparameters, the authors demonstrate that Random Forest with input differentiation is a robust baseline, particularly under nonlinear and nonstationary conditions, while traditional TS methods remain competitive in many settings. Real-world data from a Brazilian logistics firm corroborate the simulation results, with differentiated Random Forest achieving superior accuracy across products. The findings suggest practical guidance for practitioners: start with a differentiated RF as a strong, low-tuning baseline, while leveraging TS models for interpretability and efficiency; future work should explore multi-step forecasting and uncertainty quantification.
Abstract
Many planning and decision activities in logistics and supply chain management are based on forecasts of multiple time dependent factors. Therefore, the quality of planning depends on the quality of the forecasts. We compare various forecasting methods in terms of out of the box forecasting performance on a broad set of simulated time series. We simulate various linear and non-linear time series and look at the one step forecast performance of statistical learning methods.
