Table of Contents
Fetching ...

DistDF: Time-Series Forecasting Needs Joint-Distribution Wasserstein Alignment

Hao Wang, Licheng Pan, Yuan Lu, Zhixuan Chu, Xiaoxi Li, Shuting He, Zhichao Chen, Haoxuan Li, Qingsong Wen, Zhouchen Lin

TL;DR

This work identifies autocorrelation-induced bias in traditional time-series forecasting objectives and introduces DistDF, a plug-in training framework that aligns the conditional forecast and label distributions via a joint-distribution Wasserstein discrepancy. The authors prove that minimizing the joint discrepancy upper-bounds the conditional discrepancy, and provide a tractable, differentiable estimation (with a Gaussian special case using Bruce–Wasserstein). Empirically, DistDF consistently improves state-of-the-art forecast models across diverse benchmarks, with ablations showing the complementary roles of mean and covariance alignment. The approach offers a principled, generalizable path for distribution-aware forecasting with strong theoretical guarantees and practical performance gains.

Abstract

Training time-series forecast models requires aligning the conditional distribution of model forecasts with that of the label sequence. The standard direct forecast (DF) approach resorts to minimize the conditional negative log-likelihood of the label sequence, typically estimated using the mean squared error. However, this estimation proves to be biased in the presence of label autocorrelation. In this paper, we propose DistDF, which achieves alignment by alternatively minimizing a discrepancy between the conditional forecast and label distributions. Because conditional discrepancies are difficult to estimate from finite time-series observations, we introduce a newly proposed joint-distribution Wasserstein discrepancy for time-series forecasting, which provably upper bounds the conditional discrepancy of interest. This discrepancy admits tractable, differentiable estimation from empirical samples and integrates seamlessly with gradient-based training. Extensive experiments show that DistDF improves the performance diverse forecast models and achieves the state-of-the-art forecasting performance. Code is available at https://anonymous.4open.science/r/DistDF-F66B.

DistDF: Time-Series Forecasting Needs Joint-Distribution Wasserstein Alignment

TL;DR

This work identifies autocorrelation-induced bias in traditional time-series forecasting objectives and introduces DistDF, a plug-in training framework that aligns the conditional forecast and label distributions via a joint-distribution Wasserstein discrepancy. The authors prove that minimizing the joint discrepancy upper-bounds the conditional discrepancy, and provide a tractable, differentiable estimation (with a Gaussian special case using Bruce–Wasserstein). Empirically, DistDF consistently improves state-of-the-art forecast models across diverse benchmarks, with ablations showing the complementary roles of mean and covariance alignment. The approach offers a principled, generalizable path for distribution-aware forecasting with strong theoretical guarantees and practical performance gains.

Abstract

Training time-series forecast models requires aligning the conditional distribution of model forecasts with that of the label sequence. The standard direct forecast (DF) approach resorts to minimize the conditional negative log-likelihood of the label sequence, typically estimated using the mean squared error. However, this estimation proves to be biased in the presence of label autocorrelation. In this paper, we propose DistDF, which achieves alignment by alternatively minimizing a discrepancy between the conditional forecast and label distributions. Because conditional discrepancies are difficult to estimate from finite time-series observations, we introduce a newly proposed joint-distribution Wasserstein discrepancy for time-series forecasting, which provably upper bounds the conditional discrepancy of interest. This discrepancy admits tractable, differentiable estimation from empirical samples and integrates seamlessly with gradient-based training. Extensive experiments show that DistDF improves the performance diverse forecast models and achieves the state-of-the-art forecasting performance. Code is available at https://anonymous.4open.science/r/DistDF-F66B.

Paper Structure

This paper contains 38 sections, 8 theorems, 25 equations, 8 figures, 11 tables, 1 algorithm.

Key Result

Theorem 3.1

Suppose $Y_{|X}\in\mathbb{R}^\mathrm{T}$ is the label sequence given historical sequence $X$, $\hat{Y}_{|X}\in\mathbb{R}^{T}$ is the forecast sequence, $\Sigma_{|X}\in\mathbb{R}^{\mathrm{T}\times\mathrm{T}}$ is the conditional covariance of $Y_{|X}$. The bias of MSE from the negative log-likelihood where $\|v\|_{\Sigma^{-1}_{|X}}^2=v^\top\Sigma^{-1}_{|X}v$. It vanishes if the conditional covarian

Figures (8)

  • Figure 1: The conditional correlation of label components given $X$, where the forecast horizon is set to $\mathrm{T}=192$. The correlation matrices are computed for the raw labels (a), the frequency components in FreDF (b) wang2025iclrfredf and the principal components in Time-o1 (c) wang2025timeo1.
  • Figure 2: The forecast sequence of DF (in blue) and DistDF (in red), with historical length $\mathrm{H}=96$.
  • Figure 3: Improvement of DistDF applied to different forecast models, shown with colored bars for means over forecast lengths (96, 192, 336, 720) and error bars for 50% confidence intervals.
  • Figure 4: The forecast sequences generated with DF and DistDF. The forecast length is set to 336 and the experiment is conducted on ETTm2.
  • Figure 5: The forecast sequences generated with DF and DistDF. The forecast length is set to 192 and the experiment is conducted on ECL.
  • ...and 3 more figures

Theorems & Definitions (15)

  • Theorem 3.1: Autocorrelation bias
  • Definition 3.2: Wasserstein discrepancy
  • Lemma 3.3: conditionalwass
  • Theorem 3.4: Alignment property
  • Lemma 3.5: cot
  • Theorem A.1: Autocorrelation bias, Theorem \ref{['thm:bias']} in the main text
  • proof
  • Lemma A.2: Lemma \ref{['lem:bound']} in the main text
  • proof
  • Theorem A.3: Alignment property, Theorem \ref{['thm:align']} in the main text
  • ...and 5 more