Table of Contents
Fetching ...

Quadratic Direct Forecast for Training Multi-Step Time-Series Forecast Models

Hao Wang, Licheng Pan, Yuan Lu, Zhichao Chen, Tianqiao Liu, Shuting He, Zhixuan Chu, Qingsong Wen, Haoxuan Li, Zhouchen Lin

TL;DR

This work tackles two fundamental issues in training multi-step time-series forecasters: the bias from ignoring label autocorrelation and the misallocation of learning emphasis across horizons. It introduces Quadratic Direct Forecast (QDF), a learnable, quadratic-form objective with a weighting matrix $oldsymbol{igSigma}$ that captures both autocorrelation (off-diagonal) and horizon-dependent weights (diagonal). Through a bilevel optimization framework, $oldsymbol{igSigma}$ is learned to optimize generalization, after which standard training minimizes $ig\| oldsymbol{Y}-g_ heta(oldsymbol{X}) igigigig|_{oldsymbol{igSigma}^{-1}}^2$. Empirically, QDF yields state-of-the-art or competitive improvements across diverse datasets and backbone forecasters, demonstrated via extensive ablations, generalization tests, and hyperparameter analyses, while remaining model-agnostic and computationally efficient during inference. This provides a principled, data-driven path to tailor learning objectives for multi-step forecasting with practical impact on forecasting accuracy and robustness.

Abstract

The design of training objective is central to training time-series forecasting models. Existing training objectives such as mean squared error mostly treat each future step as an independent, equally weighted task, which we found leading to the following two issues: (1) overlook the label autocorrelation effect among future steps, leading to biased training objective; (2) fail to set heterogeneous task weights for different forecasting tasks corresponding to varying future steps, limiting the forecasting performance. To fill this gap, we propose a novel quadratic-form weighted training objective, addressing both of the issues simultaneously. Specifically, the off-diagonal elements of the weighting matrix account for the label autocorrelation effect, whereas the non-uniform diagonals are expected to match the most preferable weights of the forecasting tasks with varying future steps. To achieve this, we propose a Quadratic Direct Forecast (QDF) learning algorithm, which trains the forecast model using the adaptively updated quadratic-form weighting matrix. Experiments show that our QDF effectively improves performance of various forecast models, achieving state-of-the-art results. Code is available at https://anonymous.4open.science/r/QDF-8937.

Quadratic Direct Forecast for Training Multi-Step Time-Series Forecast Models

TL;DR

This work tackles two fundamental issues in training multi-step time-series forecasters: the bias from ignoring label autocorrelation and the misallocation of learning emphasis across horizons. It introduces Quadratic Direct Forecast (QDF), a learnable, quadratic-form objective with a weighting matrix that captures both autocorrelation (off-diagonal) and horizon-dependent weights (diagonal). Through a bilevel optimization framework, is learned to optimize generalization, after which standard training minimizes . Empirically, QDF yields state-of-the-art or competitive improvements across diverse datasets and backbone forecasters, demonstrated via extensive ablations, generalization tests, and hyperparameter analyses, while remaining model-agnostic and computationally efficient during inference. This provides a principled, data-driven path to tailor learning objectives for multi-step forecasting with practical impact on forecasting accuracy and robustness.

Abstract

The design of training objective is central to training time-series forecasting models. Existing training objectives such as mean squared error mostly treat each future step as an independent, equally weighted task, which we found leading to the following two issues: (1) overlook the label autocorrelation effect among future steps, leading to biased training objective; (2) fail to set heterogeneous task weights for different forecasting tasks corresponding to varying future steps, limiting the forecasting performance. To fill this gap, we propose a novel quadratic-form weighted training objective, addressing both of the issues simultaneously. Specifically, the off-diagonal elements of the weighting matrix account for the label autocorrelation effect, whereas the non-uniform diagonals are expected to match the most preferable weights of the forecasting tasks with varying future steps. To achieve this, we propose a Quadratic Direct Forecast (QDF) learning algorithm, which trains the forecast model using the adaptively updated quadratic-form weighting matrix. Experiments show that our QDF effectively improves performance of various forecast models, achieving state-of-the-art results. Code is available at https://anonymous.4open.science/r/QDF-8937.

Paper Structure

This paper contains 35 sections, 2 theorems, 7 equations, 10 figures, 9 tables, 2 algorithms.

Key Result

Theorem 3.1

Given historical sequence $\boldsymbol{X}$, let $\boldsymbol{Y}\in\mathbb{R}^\mathrm{T}$ be the associated label sequence and $g_\theta(\boldsymbol{X})\in\mathbb{R}^\mathrm{T}$ be the forecast sequence. Assuming the forecast errors follow a multivariate Gaussian distribution, the NLL of the label se where $\boldsymbol{\Sigma}\in\mathbb{R}^{\mathrm{T}\times\mathrm{T}}$ is the conditional covariance

Figures (10)

  • Figure 1: Statistics of label components conditioned on $\boldsymbol{X}$, with a forecast horizon of $\mathrm{T}=96$. (a) Partial correlation and conditional variance estimated from the raw label sequence $Y$, with colors indicating different $\boldsymbol{X}$. (b) Partial correlation matrices of label components extracted by FreDF and Time-o1 wang2025iclrfredfwang2025timeo1. Calculation details are provided in Appendix A.
  • Figure 2: The forecast sequence of DF (in blue) and QDF (in red), with historical length $\mathrm{H}=96$.
  • Figure 3: Improvement of QDF applied to different forecast models, shown with colored bars for means over forecast lengths (96, 192, 336, 720) and error bars for 50% confidence intervals.
  • Figure 4: Impact of hyperparameters on the performance of QDF.
  • Figure 5: The label autocorrelation effect on the original label sequence and the components extracted by FreDF and Time-o1 wang2025timeo1wang2025iclrfredf. The datasets are ETTh1, ETTh2, ECL, and Weather from left to right. The forecast length is uniformly set to 96.
  • ...and 5 more figures

Theorems & Definitions (4)

  • Theorem 3.1: Likelihood formulation
  • Definition 3.2
  • Theorem B.1: Likelihood formulation, Theorem \ref{['thm:like']} in the main text
  • proof