Embracing Heteroscedasticity for Probabilistic Time Series Forecasting

Yijun Wang; Qiyuan Zhuang; Xiu-Shen Wei

Embracing Heteroscedasticity for Probabilistic Time Series Forecasting

Yijun Wang, Qiyuan Zhuang, Xiu-Shen Wei

Abstract

Probabilistic time series forecasting (PTSF) aims to model the full predictive distribution of future observations, enabling both accurate forecasting and principled uncertainty quantification. A central requirement of PTSF is to embrace heteroscedasticity, as real-world time series exhibit time-varying conditional variances induced by nonstationary dynamics, regime changes, and evolving external conditions. However, most existing non-autoregressive generative approaches to PTSF, such as TimeVAE and $K^2$VAE, rely on MSE-based training objectives that implicitly impose a homoscedastic assumption, thereby fundamentally limiting their ability to model temporal heteroscedasticity. To address this limitation, we propose the Location-Scale Gaussian VAE (LSG-VAE), a simple but effective framework that explicitly parameterizes both the predictive mean and time-dependent variance through a location-scale likelihood formulation. This design enables LSG-VAE to faithfully capture heteroscedastic aleatoric uncertainty and introduces an adaptive attenuation mechanism that automatically down-weights highly volatile observations during training, leading to improved robustness in trend prediction. Extensive experiments on nine benchmark datasets demonstrate that LSG-VAE consistently outperforms fifteen strong generative baselines while maintaining high computational efficiency suitable for real-time deployment.

Embracing Heteroscedasticity for Probabilistic Time Series Forecasting

Abstract

VAE, rely on MSE-based training objectives that implicitly impose a homoscedastic assumption, thereby fundamentally limiting their ability to model temporal heteroscedasticity. To address this limitation, we propose the Location-Scale Gaussian VAE (LSG-VAE), a simple but effective framework that explicitly parameterizes both the predictive mean and time-dependent variance through a location-scale likelihood formulation. This design enables LSG-VAE to faithfully capture heteroscedastic aleatoric uncertainty and introduces an adaptive attenuation mechanism that automatically down-weights highly volatile observations during training, leading to improved robustness in trend prediction. Extensive experiments on nine benchmark datasets demonstrate that LSG-VAE consistently outperforms fifteen strong generative baselines while maintaining high computational efficiency suitable for real-time deployment.

Paper Structure (62 sections, 22 equations, 7 figures, 9 tables)

This paper contains 62 sections, 22 equations, 7 figures, 9 tables.

Introduction
Methodology
Problem Formulation and Motivation
Probabilistic Forecasting Formulation
The Homoscedastic Trap in Vanilla VAEs
The LSG-VAE Framework
Patching and Variational Encoding
Non-autoregressive Latent Dynamics
Location-Scale Probabilistic Decoding
Objective Function: Embracing Heteroscedasticity
Experiments on Synthetic Data
Data Generating Process
Results and Analysis
Experiments on Real-World Data
Evaluation under the $K^2$VAE Protocol
...and 47 more sections

Figures (7)

Figure 1: Homoscedastic vs. Heteroscedastic Forecasting. (a) Standard MSE-based models assume constant variance, leading the mean predictor to fit transient fluctuations. (b) LSG-VAE models time-varying uncertainty, allowing the mean prediction to focus on the underlying trend.
Figure 2: Overview of the LSG-VAE Framework. The model segments the input series into patches, encodes them into a probabilistic latent space, evolves the state via efficient global dynamics, and decodes the future using a dual-head mechanism to jointly predict location ($\mu$) and scale ($\sigma$).
Figure 3: Qualitative results on synthetic datasets. Left: forecasting with adaptive 95% confidence intervals under varying volatility. Right: uncertainty recovery, where the predicted volatility $\hat{\sigma}_t$ closely matches the ground truth.
Figure 4: Visual comparison of probabilistic forecasting on ETTh1. All subfigures share a unified Y-axis scale. Additional visualization results are provided in Appendix \ref{['showcases']}.
Figure 5: Efficiency vs. Performance comparison on ETTh2 ($L=96, H=720$). The bubble size indicates inference memory usage (smaller is better).
...and 2 more figures

Embracing Heteroscedasticity for Probabilistic Time Series Forecasting

Abstract

Embracing Heteroscedasticity for Probabilistic Time Series Forecasting

Authors

Abstract

Table of Contents

Figures (7)