Epistemic Error Decomposition for Multi-step Time Series Forecasting: Rethinking Bias-Variance in Recursive and Direct Strategies
Riku Green, Huw Day, Zahraa S. Abdallah, Telmo M. Silva Filho
TL;DR
This work tackles the puzzling behavior of recursive versus direct multi-step time-series forecasting by proposing an epistemic error decomposition into irreducible noise, a structural gap, and estimation variance. It shows that linear predictors have zero structural gap ($G_{\Delta}=0$), while nonlinear recursion can alter expressivity and yield data-dependent bias, with estimation variance amplified through a Jacobian of the composition map. A delta-method framework derives a Jacobian-based expression for recursion-induced variance, introducing an amplification factor $T_h$ that links one-step variance to multi-step variance, and distinguishes the roles of process versus measurement noise in strategy selection. Empirical results on MLPs with the ETTm1 dataset corroborate the theoretical insights, demonstrating lower bias but higher variance for recursion under certain noise regimes, and offering practical guidance for choosing between recursive and direct strategies beyond traditional bias–variance heuristics.
Abstract
Multi-step forecasting is often described through a simple rule of thumb: recursive strategies are said to have high bias and low variance, while direct strategies are said to have low bias and high variance. We revisit this belief by decomposing the expected multi-step forecast error into three parts: irreducible noise, a structural approximation gap, and an estimation-variance term. For linear predictors we show that the structural gap is identically zero for any dataset. For nonlinear predictors, however, the repeated composition used in recursion can increase model expressivity, making the structural gap depend on both the model and the data. We further show that the estimation variance of the recursive strategy at any horizon can be written as the one-step variance multiplied by a Jacobian-based amplification factor that measures how sensitive the composed predictor is to parameter error. This perspective explains when recursive forecasting may simultaneously have lower bias and higher variance than direct forecasting. Experiments with multilayer perceptrons on the ETTm1 dataset confirm these findings. The results offer practical guidance for choosing between recursive and direct strategies based on model nonlinearity and noise characteristics, rather than relying on traditional bias-variance intuition.
