Table of Contents
Fetching ...

Embracing Heteroscedasticity for Probabilistic Time Series Forecasting

Yijun Wang, Qiyuan Zhuang, Xiu-Shen Wei

Abstract

Probabilistic time series forecasting (PTSF) aims to model the full predictive distribution of future observations, enabling both accurate forecasting and principled uncertainty quantification. A central requirement of PTSF is to embrace heteroscedasticity, as real-world time series exhibit time-varying conditional variances induced by nonstationary dynamics, regime changes, and evolving external conditions. However, most existing non-autoregressive generative approaches to PTSF, such as TimeVAE and $K^2$VAE, rely on MSE-based training objectives that implicitly impose a homoscedastic assumption, thereby fundamentally limiting their ability to model temporal heteroscedasticity. To address this limitation, we propose the Location-Scale Gaussian VAE (LSG-VAE), a simple but effective framework that explicitly parameterizes both the predictive mean and time-dependent variance through a location-scale likelihood formulation. This design enables LSG-VAE to faithfully capture heteroscedastic aleatoric uncertainty and introduces an adaptive attenuation mechanism that automatically down-weights highly volatile observations during training, leading to improved robustness in trend prediction. Extensive experiments on nine benchmark datasets demonstrate that LSG-VAE consistently outperforms fifteen strong generative baselines while maintaining high computational efficiency suitable for real-time deployment.

Embracing Heteroscedasticity for Probabilistic Time Series Forecasting

Abstract

Probabilistic time series forecasting (PTSF) aims to model the full predictive distribution of future observations, enabling both accurate forecasting and principled uncertainty quantification. A central requirement of PTSF is to embrace heteroscedasticity, as real-world time series exhibit time-varying conditional variances induced by nonstationary dynamics, regime changes, and evolving external conditions. However, most existing non-autoregressive generative approaches to PTSF, such as TimeVAE and VAE, rely on MSE-based training objectives that implicitly impose a homoscedastic assumption, thereby fundamentally limiting their ability to model temporal heteroscedasticity. To address this limitation, we propose the Location-Scale Gaussian VAE (LSG-VAE), a simple but effective framework that explicitly parameterizes both the predictive mean and time-dependent variance through a location-scale likelihood formulation. This design enables LSG-VAE to faithfully capture heteroscedastic aleatoric uncertainty and introduces an adaptive attenuation mechanism that automatically down-weights highly volatile observations during training, leading to improved robustness in trend prediction. Extensive experiments on nine benchmark datasets demonstrate that LSG-VAE consistently outperforms fifteen strong generative baselines while maintaining high computational efficiency suitable for real-time deployment.
Paper Structure (62 sections, 22 equations, 7 figures, 9 tables)

This paper contains 62 sections, 22 equations, 7 figures, 9 tables.

Figures (7)

  • Figure 1: Homoscedastic vs. Heteroscedastic Forecasting. (a) Standard MSE-based models assume constant variance, leading the mean predictor to fit transient fluctuations. (b) LSG-VAE models time-varying uncertainty, allowing the mean prediction to focus on the underlying trend.
  • Figure 2: Overview of the LSG-VAE Framework. The model segments the input series into patches, encodes them into a probabilistic latent space, evolves the state via efficient global dynamics, and decodes the future using a dual-head mechanism to jointly predict location ($\mu$) and scale ($\sigma$).
  • Figure 3: Qualitative results on synthetic datasets. Left: forecasting with adaptive 95% confidence intervals under varying volatility. Right: uncertainty recovery, where the predicted volatility $\hat{\sigma}_t$ closely matches the ground truth.
  • Figure 4: Visual comparison of probabilistic forecasting on ETTh1. All subfigures share a unified Y-axis scale. Additional visualization results are provided in Appendix \ref{['showcases']}.
  • Figure 5: Efficiency vs. Performance comparison on ETTh2 ($L=96, H=720$). The bubble size indicates inference memory usage (smaller is better).
  • ...and 2 more figures