Table of Contents
Fetching ...

Comparing statistical and machine learning methods for time series forecasting in data-driven logistics -- A simulation study

Lena Schmid, Moritz Roidl, Markus Pauly

TL;DR

This study benchmarks out-of-the-box forecasting approaches for data-driven logistics by contrasting traditional time-series models ($ARIMA$, $SARIMA$, $TBATS$) with tree-based ML methods (Random Forest, XGBoost) across a broad set of simulated data-generating processes, queueing models, and added complexities such as jumps and noise. Using sliding-window preprocessing and a fixed set of hyperparameters, the authors demonstrate that Random Forest with input differentiation is a robust baseline, particularly under nonlinear and nonstationary conditions, while traditional TS methods remain competitive in many settings. Real-world data from a Brazilian logistics firm corroborate the simulation results, with differentiated Random Forest achieving superior accuracy across products. The findings suggest practical guidance for practitioners: start with a differentiated RF as a strong, low-tuning baseline, while leveraging TS models for interpretability and efficiency; future work should explore multi-step forecasting and uncertainty quantification.

Abstract

Many planning and decision activities in logistics and supply chain management are based on forecasts of multiple time dependent factors. Therefore, the quality of planning depends on the quality of the forecasts. We compare various forecasting methods in terms of out of the box forecasting performance on a broad set of simulated time series. We simulate various linear and non-linear time series and look at the one step forecast performance of statistical learning methods.

Comparing statistical and machine learning methods for time series forecasting in data-driven logistics -- A simulation study

TL;DR

This study benchmarks out-of-the-box forecasting approaches for data-driven logistics by contrasting traditional time-series models (, , ) with tree-based ML methods (Random Forest, XGBoost) across a broad set of simulated data-generating processes, queueing models, and added complexities such as jumps and noise. Using sliding-window preprocessing and a fixed set of hyperparameters, the authors demonstrate that Random Forest with input differentiation is a robust baseline, particularly under nonlinear and nonstationary conditions, while traditional TS methods remain competitive in many settings. Real-world data from a Brazilian logistics firm corroborate the simulation results, with differentiated Random Forest achieving superior accuracy across products. The findings suggest practical guidance for practitioners: start with a differentiated RF as a strong, low-tuning baseline, while leveraging TS models for interpretability and efficiency; future work should explore multi-step forecasting and uncertainty quantification.

Abstract

Many planning and decision activities in logistics and supply chain management are based on forecasts of multiple time dependent factors. Therefore, the quality of planning depends on the quality of the forecasts. We compare various forecasting methods in terms of out of the box forecasting performance on a broad set of simulated time series. We simulate various linear and non-linear time series and look at the one step forecast performance of statistical learning methods.
Paper Structure (31 sections, 4 equations, 25 figures, 3 tables)

This paper contains 31 sections, 4 equations, 25 figures, 3 tables.

Figures (25)

  • Figure 1: MSE of ML approaches separated by the sliding window size for the M/M/1 setting. XGB stands for XGBoost and RF for Random Forest; diff in the method name indicates that the data were differentiated.
  • Figure 2: MSE of ML approaches separated by the sliding window size for the M/M/2 setting. XGB stands for XGBoost and RF for Random Forest; diff in the method name indicates that the data were differentiated.
  • Figure 3: MSE of time series and naive approaches for the M/M/1 (left) and M/M/2 (right) setting. ARIMA and SARIMA models have identical MSE values, as no seasonality was present.
  • Figure 4: MSE of the Random Forest approaches separated by the sliding window size and differentiation for the different data generating processes.
  • Figure 5: MSE of XGBoost approaches separated by the sliding window size and differentiation for the different data generating processes.
  • ...and 20 more figures