Table of Contents
Fetching ...

Epistemic Error Decomposition for Multi-step Time Series Forecasting: Rethinking Bias-Variance in Recursive and Direct Strategies

Riku Green, Huw Day, Zahraa S. Abdallah, Telmo M. Silva Filho

TL;DR

This work tackles the puzzling behavior of recursive versus direct multi-step time-series forecasting by proposing an epistemic error decomposition into irreducible noise, a structural gap, and estimation variance. It shows that linear predictors have zero structural gap ($G_{\Delta}=0$), while nonlinear recursion can alter expressivity and yield data-dependent bias, with estimation variance amplified through a Jacobian of the composition map. A delta-method framework derives a Jacobian-based expression for recursion-induced variance, introducing an amplification factor $T_h$ that links one-step variance to multi-step variance, and distinguishes the roles of process versus measurement noise in strategy selection. Empirical results on MLPs with the ETTm1 dataset corroborate the theoretical insights, demonstrating lower bias but higher variance for recursion under certain noise regimes, and offering practical guidance for choosing between recursive and direct strategies beyond traditional bias–variance heuristics.

Abstract

Multi-step forecasting is often described through a simple rule of thumb: recursive strategies are said to have high bias and low variance, while direct strategies are said to have low bias and high variance. We revisit this belief by decomposing the expected multi-step forecast error into three parts: irreducible noise, a structural approximation gap, and an estimation-variance term. For linear predictors we show that the structural gap is identically zero for any dataset. For nonlinear predictors, however, the repeated composition used in recursion can increase model expressivity, making the structural gap depend on both the model and the data. We further show that the estimation variance of the recursive strategy at any horizon can be written as the one-step variance multiplied by a Jacobian-based amplification factor that measures how sensitive the composed predictor is to parameter error. This perspective explains when recursive forecasting may simultaneously have lower bias and higher variance than direct forecasting. Experiments with multilayer perceptrons on the ETTm1 dataset confirm these findings. The results offer practical guidance for choosing between recursive and direct strategies based on model nonlinearity and noise characteristics, rather than relying on traditional bias-variance intuition.

Epistemic Error Decomposition for Multi-step Time Series Forecasting: Rethinking Bias-Variance in Recursive and Direct Strategies

TL;DR

This work tackles the puzzling behavior of recursive versus direct multi-step time-series forecasting by proposing an epistemic error decomposition into irreducible noise, a structural gap, and estimation variance. It shows that linear predictors have zero structural gap (), while nonlinear recursion can alter expressivity and yield data-dependent bias, with estimation variance amplified through a Jacobian of the composition map. A delta-method framework derives a Jacobian-based expression for recursion-induced variance, introducing an amplification factor that links one-step variance to multi-step variance, and distinguishes the roles of process versus measurement noise in strategy selection. Empirical results on MLPs with the ETTm1 dataset corroborate the theoretical insights, demonstrating lower bias but higher variance for recursion under certain noise regimes, and offering practical guidance for choosing between recursive and direct strategies beyond traditional bias–variance heuristics.

Abstract

Multi-step forecasting is often described through a simple rule of thumb: recursive strategies are said to have high bias and low variance, while direct strategies are said to have low bias and high variance. We revisit this belief by decomposing the expected multi-step forecast error into three parts: irreducible noise, a structural approximation gap, and an estimation-variance term. For linear predictors we show that the structural gap is identically zero for any dataset. For nonlinear predictors, however, the repeated composition used in recursion can increase model expressivity, making the structural gap depend on both the model and the data. We further show that the estimation variance of the recursive strategy at any horizon can be written as the one-step variance multiplied by a Jacobian-based amplification factor that measures how sensitive the composed predictor is to parameter error. This perspective explains when recursive forecasting may simultaneously have lower bias and higher variance than direct forecasting. Experiments with multilayer perceptrons on the ETTm1 dataset confirm these findings. The results offer practical guidance for choosing between recursive and direct strategies based on model nonlinearity and noise characteristics, rather than relying on traditional bias-variance intuition.

Paper Structure

This paper contains 44 sections, 3 theorems, 59 equations, 5 figures, 1 table.

Key Result

Theorem 1

Under standard regularity for linear-in-parameters models with exogenous regressors, the recursive $h$-ahead estimation-variance satisfies Let the one–step baseline be $\mathrm{EV}_{1\text{-step}}=\mathrm{tr}(\Sigma_\theta Q)$. Define the (dimensionless) amplification

Figures (5)

  • Figure 1: Recursive composition expands the representable function space and can reduce bias. (a) Schematic: direct strategies (in $c$) span a 3D subspace, while recursive strategies (in $\alpha$ under recursive composition) occupy a richer 5D subspace. We show a 6D task manifold (in $\theta$). Green or red task show $\alpha$ is closer or further to task setting, respectively. (b) Empirical coefficient–structure bias: recursive models ($\alpha$-space) lie systematically closer to uniformly sampled task parameters than direct ones ($c$-space). (c) Pairwise MSE comparison: recursive predictors can—but do not necessarily—achieve lower bias than direct models.
  • Figure 2: Mapping geometry and variance propagation between one-step and two-step forecasting models.Left: one-step linear parameter space $(b_1,b_2)$ for two synthetic tasks (A, B), each defining a local region of fitted one-step models $\hat{y}_{t+1}=b_1y_t+b_2y_{t-1}$. Middle: corresponding two-step coefficient space $(\phi_1,\phi_2)$ obtained via recursive composition (solid ellipses, $\alpha$) and via direct two-step fitting (dashed ellipses, $c$). The nonlinear mapping from $b$-space to $\phi$-space distorts local geometry, leading to either amplification or contraction of parameter variance depending on position. Right: total coefficient variance before mapping ($b$), after recursive mapping ($\alpha$), and for directly fitted models ($c$). For illustration, we assume equal variance in the direct coefficients to the one-step task which is likely to vary in practice.
  • Figure 3: Empirical Validation of Theoretical EV and the Impact of Estimator Bias. This figure analyzes our theoretical model's performance by sweeping across stable AR(2) parameters ($a, \gamma$) and varying process ($\sigma_s$) and measurement ($\sigma_e$) noise. (A) In the ideal case with minimal measurement noise ($\sigma_e \approx 0$), the OLS estimator is unbiased. Here, the empirical EV aligns almost perfectly with our theoretical prediction, validating the fundamental accuracy of the Jacobian-based formulation. (B) This panel maps the theory's goodness-of-fit by plotting the Pearson correlation between theoretical and empirical EV for each noise configuration. A clear degradation is visible as measurement noise domninates process noise. (C) This panel diagnoses the root cause of the breakdown. It visualizes the parameter bias of the OLS estimator, measured as the Euclidean distance between the true DGP parameters and the average converged coefficients. The heatmap shows that the estimator bias grows significantly with measurement noise, mirroring the pattern of degradation seen in panel (B).
  • Figure 4: MLP forecasting on ETTm1 ($h{=}2$): comparison of learning curves and ratio-based metrics. Each panel compares recursive and direct MLPs trained under identical capacity and data conditions (lags$=50$, width$=2$, 50 seeds). (left) Learning curves show effective learning and capacity saturation. (middle) $\rho_{MSE}$ is rhe relative MSE and (right) $\rho_{VAR}$ denote the ratios of recursive to direct errors ($\mathrm{Rec}/\mathrm{Dir}$), where $y{=}1$ indicates parity. Across sample sizes, the recursive model exhibits lower bias (train/test $\rho_{MSE}$$<1$ at large $N$) but higher estimation variance ( $\rho_{VAR}$$>1$), supporting our theoretical findings that recursion is not necessarily higher bias and can amplify variance.
  • Figure 5: Comparison of analytic and empirical proportions across $\sigma_\varepsilon$ and $\sigma_s$. Left: analytic $\Delta \varepsilon$ proportion. Right: empirical MSE proportion.

Theorems & Definitions (4)

  • Theorem 1: Jacobian-driven EV for recursion
  • proof : Proof sketch
  • Lemma 1: Delta method for the composition map
  • Proposition 1: EV for recursion