Autoregressive Denoising Diffusion Models for Multivariate Probabilistic Time Series Forecasting
Kashif Rasul, Calvin Seward, Ingmar Schuster, Roland Vollgraf
TL;DR
This work introduces TimeGrad, an autoregressive denoising diffusion model for multivariate probabilistic time series forecasting that learns per-step conditional distributions by denoising diffused observations. It combines an RNN-based hidden state with a diffusion emission head and trains via a variational diffusion objective, using Langevin-like sampling to generate multiple future trajectories for uncertainty quantification. The method achieves state-of-the-art CRPS_sum on six diverse real-world datasets, demonstrating strong probabilistic forecasting performance in high-dimensional settings. The paper also discusses ablations, scaling techniques, covariate integration, and future directions for faster sampling and extensions with advanced architectures.
Abstract
In this work, we propose \texttt{TimeGrad}, an autoregressive model for multivariate probabilistic time series forecasting which samples from the data distribution at each time step by estimating its gradient. To this end, we use diffusion probabilistic models, a class of latent variable models closely connected to score matching and energy-based methods. Our model learns gradients by optimizing a variational bound on the data likelihood and at inference time converts white noise into a sample of the distribution of interest through a Markov chain using Langevin sampling. We demonstrate experimentally that the proposed autoregressive denoising diffusion model is the new state-of-the-art multivariate probabilistic forecasting method on real-world data sets with thousands of correlated dimensions. We hope that this method is a useful tool for practitioners and lays the foundation for future research in this area.
