Table of Contents
Fetching ...

Learning with Imperfect Models: When Multi-step Prediction Mitigates Compounding Error

Anne Somalwar, Bruce D. Lee, George J. Pappas, Nikolai Matni

TL;DR

This work analyzes the trade-off between direct multi-step prediction and autoregressive single-step rollouts in linear dynamical systems, focusing on how model specification and observability affect long-horizon accuracy. In well-specified, fully observed settings, the autoregressive single-step approach achieves lower asymptotic error than direct multi-step predictors, with the gap increasing with horizon and system stability. Under misspecification caused by partial observability, direct multi-step predictors can substantially reduce bias and outperform single-step rollouts, as the irreducible error is dominated by estimation bias rather than process noise. The findings provide principled guidance for choosing and designing multi-step predictors in learning-based control, and are supported by numerical experiments and theoretical proofs of reducible vs irreducible error components.

Abstract

Compounding error, where small prediction mistakes accumulate over time, presents a major challenge in learning-based control. For example, this issue often limits the performance of model-based reinforcement learning and imitation learning. One common approach to mitigate compounding error is to train multi-step predictors directly, rather than relying on autoregressive rollout of a single-step model. However, it is not well understood when the benefits of multi-step prediction outweigh the added complexity of learning a more complicated model. In this work, we provide a rigorous analysis of this trade-off in the context of linear dynamical systems. We show that when the model class is well-specified and accurately captures the system dynamics, single-step models achieve lower asymptotic prediction error. On the other hand, when the model class is misspecified due to partial observability, direct multi-step predictors can significantly reduce bias and thus outperform single-step approaches. These theoretical results are supported by numerical experiments, wherein we also (a) empirically evaluate an intermediate strategy which trains a single-step model using a multi-step loss and (b) evaluate performance of single step and multi-step predictors in a closed loop control setting.

Learning with Imperfect Models: When Multi-step Prediction Mitigates Compounding Error

TL;DR

This work analyzes the trade-off between direct multi-step prediction and autoregressive single-step rollouts in linear dynamical systems, focusing on how model specification and observability affect long-horizon accuracy. In well-specified, fully observed settings, the autoregressive single-step approach achieves lower asymptotic error than direct multi-step predictors, with the gap increasing with horizon and system stability. Under misspecification caused by partial observability, direct multi-step predictors can substantially reduce bias and outperform single-step rollouts, as the irreducible error is dominated by estimation bias rather than process noise. The findings provide principled guidance for choosing and designing multi-step predictors in learning-based control, and are supported by numerical experiments and theoretical proofs of reducible vs irreducible error components.

Abstract

Compounding error, where small prediction mistakes accumulate over time, presents a major challenge in learning-based control. For example, this issue often limits the performance of model-based reinforcement learning and imitation learning. One common approach to mitigate compounding error is to train multi-step predictors directly, rather than relying on autoregressive rollout of a single-step model. However, it is not well understood when the benefits of multi-step prediction outweigh the added complexity of learning a more complicated model. In this work, we provide a rigorous analysis of this trade-off in the context of linear dynamical systems. We show that when the model class is well-specified and accurately captures the system dynamics, single-step models achieve lower asymptotic prediction error. On the other hand, when the model class is misspecified due to partial observability, direct multi-step predictors can significantly reduce bias and thus outperform single-step approaches. These theoretical results are supported by numerical experiments, wherein we also (a) empirically evaluate an intermediate strategy which trains a single-step model using a multi-step loss and (b) evaluate performance of single step and multi-step predictors in a closed loop control setting.

Paper Structure

This paper contains 19 sections, 6 theorems, 82 equations, 5 figures.

Key Result

Proposition III.1

The reducible asymptotic error of the multi-step predictor $\hat{G}^{MS}_N$ is given by where $M_{MS} \in \mathbb{R}^{H \times H }$ is the matrix with entry $(i,j)$ given by $M_{MS}^{ij} = \mathop{\mathrm{\mathrm{tr}}}\nolimits(A^{{\left\vert i-j \right\vert}})$.

Figures (5)

  • Figure 1: Comparison of the bias from the single and multi-step estimators across different horizons for the system defined in Example \ref{['example']}.
  • Figure 2: Convergence of $N\mathop{\mathrm{\textbf{E}}}\nolimits[L(\hat{f}_H)]$ to the reducible prediction errors given in \ref{['prop: well specified multistep']} (multi-step predictor) and \ref{['prop: single-step well specified']} (single-step rollout) for the system defined by \ref{['eq: numerical ex system']} with $a = 0.5, 0.75,0.9$ (left to right) and horizon $H=5$.
  • Figure 3: Convergence of $\mathop{\mathrm{\textbf{E}}}\nolimits[L(\hat{f}_H)]$ to the irreducible prediction errors given in \ref{['prop: multi-step misspecified']} (multi-step predictor) and \ref{['prop: single-step misspecified']} (single-step rollout) for the system defined by \ref{['eq: numerical ex system']} with $a = 0.5, 0.75,0.9$ (left to right) and horizon $H=5$.
  • Figure 4: Comparison of the rate of decay for the error in the well-specified case (a) and the total error in the misspecified case (b) for the direct multi-step predictor, and the single-step predictor trained with a single-step loss or a multi-step loss.
  • Figure 5: Infinite-horizon LQR performance in the well-specified case (a) and the misspecified case (b). In (b), closed loop spectral radius greater than 1 for the one step predictor implies infinite LQR cost.

Theorems & Definitions (13)

  • Proposition III.1
  • Proposition III.2
  • Proposition IV.1
  • Proposition IV.2
  • Example IV.1
  • Lemma I.1
  • proof : Proof of \ref{['lemma: spectral radius of CK+KB']}
  • proof
  • proof
  • proof
  • ...and 3 more