Table of Contents
Fetching ...

EnTransformer: A Deep Generative Transformer for Multivariate Probabilistic Forecasting

Rajdeep Pathak, Rahul Goswami, Madhurima Panja, Palash Ghosh, Tanujit Chakraborty

Abstract

Reliable uncertainty quantification is critical in multivariate time series forecasting problems arising in domains such as energy systems and transportation networks, among many others. Although Transformer-based architectures have recently achieved strong performance for sequence modeling, most probabilistic forecasting approaches rely on restrictive parametric likelihoods or quantile-based objectives. They can struggle to capture complex joint predictive distributions across multiple correlated time series. This work proposes EnTransformer, a deep generative forecasting framework that integrates engression, a stochastic learning paradigm for modeling conditional distributions, with the expressive sequence modeling capabilities of Transformers. The proposed approach injects stochastic noise into the model representation and optimizes an energy-based scoring objective to directly learn the conditional predictive distribution without imposing parametric assumptions. This design enables EnTransformer to generate coherent multivariate forecast trajectories while preserving Transformers' capacity to effectively model long-range temporal dependencies and cross-series interactions. We evaluate our proposed EnTransformer on several widely used benchmarks for multivariate probabilistic forecasting, including Electricity, Traffic, Solar, Taxi, KDD-cup, and Wikipedia datasets. Experimental results demonstrate that EnTransformer produces well-calibrated probabilistic forecasts and consistently outperforms the benchmark models.

EnTransformer: A Deep Generative Transformer for Multivariate Probabilistic Forecasting

Abstract

Reliable uncertainty quantification is critical in multivariate time series forecasting problems arising in domains such as energy systems and transportation networks, among many others. Although Transformer-based architectures have recently achieved strong performance for sequence modeling, most probabilistic forecasting approaches rely on restrictive parametric likelihoods or quantile-based objectives. They can struggle to capture complex joint predictive distributions across multiple correlated time series. This work proposes EnTransformer, a deep generative forecasting framework that integrates engression, a stochastic learning paradigm for modeling conditional distributions, with the expressive sequence modeling capabilities of Transformers. The proposed approach injects stochastic noise into the model representation and optimizes an energy-based scoring objective to directly learn the conditional predictive distribution without imposing parametric assumptions. This design enables EnTransformer to generate coherent multivariate forecast trajectories while preserving Transformers' capacity to effectively model long-range temporal dependencies and cross-series interactions. We evaluate our proposed EnTransformer on several widely used benchmarks for multivariate probabilistic forecasting, including Electricity, Traffic, Solar, Taxi, KDD-cup, and Wikipedia datasets. Experimental results demonstrate that EnTransformer produces well-calibrated probabilistic forecasts and consistently outperforms the benchmark models.
Paper Structure (25 sections, 9 equations, 11 figures, 5 tables, 2 algorithms)

This paper contains 25 sections, 9 equations, 11 figures, 5 tables, 2 algorithms.

Figures (11)

  • Figure 1: Overview of the EnTransformer architecture, consisting of noise injection, in-sample forecast generation, and optimization using the energy score loss. An ensemble of out-of-sample forecasts for all $D$ nodes for the next $q$ steps can be generated from the trained model by passing the $p$-length look-back window as input.
  • Figure 2: Forecasts produced by EnTransformer on selected nodes of the Traffic dataset on test window 5.
  • Figure 3: MCB test results based on $\operatorname{CRPS}_{\text{sum}}$ metric. On the y-axis, the notation $\nu$-$r$ indicates that model $\nu$ achieved an average rank of $r$.
  • Figure 4: Probability Integral Transform (PIT) Q-Q plots evaluating the predictive calibration of EnTransformer across the six datasets. The empirical quantiles (solid blue) are plotted against the theoretical quantiles of an ideal uniform distribution, $\mathcal{U}(0,1)$ (dashed black). The test window taken for each dataset corresponds to Figs. \ref{['fig:solar_forecasts']}-\ref{['fig:wikipedia_forecasts']} in Appendix \ref{['appendix:figures']}.
  • Figure 5: Ablation study analyzing the effect of the in-sample training ensemble size ($M$) on the proposed EnTransformer. (A) Computational overhead represented by the total training time in seconds; (B) Forecasting performance evaluated by the $\operatorname{CRPS}_{\text{sum}}$ metric across the six datasets. The circular points represent the mean values, and the vertical error lines indicate the standard deviation computed over 10 independent runs.
  • ...and 6 more figures