DistDF: Time-Series Forecasting Needs Joint-Distribution Wasserstein Alignment

Hao Wang; Licheng Pan; Yuan Lu; Zhixuan Chu; Xiaoxi Li; Shuting He; Zhichao Chen; Haoxuan Li; Qingsong Wen; Zhouchen Lin

DistDF: Time-Series Forecasting Needs Joint-Distribution Wasserstein Alignment

Hao Wang, Licheng Pan, Yuan Lu, Zhixuan Chu, Xiaoxi Li, Shuting He, Zhichao Chen, Haoxuan Li, Qingsong Wen, Zhouchen Lin

TL;DR

This work identifies autocorrelation-induced bias in traditional time-series forecasting objectives and introduces DistDF, a plug-in training framework that aligns the conditional forecast and label distributions via a joint-distribution Wasserstein discrepancy. The authors prove that minimizing the joint discrepancy upper-bounds the conditional discrepancy, and provide a tractable, differentiable estimation (with a Gaussian special case using Bruce–Wasserstein). Empirically, DistDF consistently improves state-of-the-art forecast models across diverse benchmarks, with ablations showing the complementary roles of mean and covariance alignment. The approach offers a principled, generalizable path for distribution-aware forecasting with strong theoretical guarantees and practical performance gains.

Abstract

Training time-series forecast models requires aligning the conditional distribution of model forecasts with that of the label sequence. The standard direct forecast (DF) approach resorts to minimize the conditional negative log-likelihood of the label sequence, typically estimated using the mean squared error. However, this estimation proves to be biased in the presence of label autocorrelation. In this paper, we propose DistDF, which achieves alignment by alternatively minimizing a discrepancy between the conditional forecast and label distributions. Because conditional discrepancies are difficult to estimate from finite time-series observations, we introduce a newly proposed joint-distribution Wasserstein discrepancy for time-series forecasting, which provably upper bounds the conditional discrepancy of interest. This discrepancy admits tractable, differentiable estimation from empirical samples and integrates seamlessly with gradient-based training. Extensive experiments show that DistDF improves the performance diverse forecast models and achieves the state-of-the-art forecasting performance. Code is available at https://anonymous.4open.science/r/DistDF-F66B.

DistDF: Time-Series Forecasting Needs Joint-Distribution Wasserstein Alignment

TL;DR

Abstract

DistDF: Time-Series Forecasting Needs Joint-Distribution Wasserstein Alignment

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (15)