Table of Contents
Fetching ...

Multivariate Probabilistic Time Series Forecasting with Correlated Errors

Vincent Zhihao Zheng, Lijun Sun

Abstract

Accurately modeling the correlation structure of errors is critical for reliable uncertainty quantification in probabilistic time series forecasting. While recent deep learning models for multivariate time series have developed efficient parameterizations for time-varying contemporaneous covariance, but they often assume temporal independence of errors for simplicity. However, real-world data often exhibit significant error autocorrelation and cross-lag correlation due to factors such as missing covariates. In this paper, we introduce a plug-and-play method that learns the covariance structure of errors over multiple steps for autoregressive models with Gaussian-distributed errors. To ensure scalable inference and computational efficiency, we model the contemporaneous covariance using a low-rank-plus-diagonal parameterization and capture cross-covariance through a group of independent latent temporal processes. The learned covariance matrix is then used to calibrate predictions based on observed residuals. We evaluate our method on probabilistic models built on RNNs and Transformer architectures, and the results confirm the effectiveness of our approach in improving predictive accuracy and uncertainty quantification without significantly increasing the parameter size.

Multivariate Probabilistic Time Series Forecasting with Correlated Errors

Abstract

Accurately modeling the correlation structure of errors is critical for reliable uncertainty quantification in probabilistic time series forecasting. While recent deep learning models for multivariate time series have developed efficient parameterizations for time-varying contemporaneous covariance, but they often assume temporal independence of errors for simplicity. However, real-world data often exhibit significant error autocorrelation and cross-lag correlation due to factors such as missing covariates. In this paper, we introduce a plug-and-play method that learns the covariance structure of errors over multiple steps for autoregressive models with Gaussian-distributed errors. To ensure scalable inference and computational efficiency, we model the contemporaneous covariance using a low-rank-plus-diagonal parameterization and capture cross-covariance through a group of independent latent temporal processes. The learned covariance matrix is then used to calibrate predictions based on observed residuals. We evaluate our method on probabilistic models built on RNNs and Transformer architectures, and the results confirm the effectiveness of our approach in improving predictive accuracy and uncertainty quantification without significantly increasing the parameter size.
Paper Structure (37 sections, 35 equations, 30 figures, 15 tables)

This paper contains 37 sections, 35 equations, 30 figures, 15 tables.

Figures (30)

  • Figure 1: Contemporaneous covariance matrix $\operatorname{Cov}(\boldsymbol{\eta}_{t}, \boldsymbol{\eta}_{t})$ and cross-covariance matrix $\operatorname{Cov}(\boldsymbol{\eta}_{t-\Delta}, \boldsymbol{\eta}_{t}), \Delta=1,2,3$, calculated based on the one-step-ahead prediction residuals of GPVar on a batch of time series from the $\mathtt{m4\_hourly}$ dataset. For visualization clarity, covariance are clipped to the range $[0,0.6]$.
  • Figure 2: Graphic illustration of Eq. \ref{['eqn:eq_bat']}, where $B$ is the number of time series in a batch, $R$ is the rank of the covariance factor, $D$ is the time window we consider cross-correlation, $P$ and $Q$ are the conditioning range and prediction range. Cross-correlation is modeled by introducing correlation in each row of matrix $\mathbf{r}_{t-D+1:t}$.
  • Figure 3: Illustration of the training process. Following salinas2019high, time series dimensions are randomly sampled, and the base model (e.g., RNNs) is unrolled for each dimension individually (e.g., 1, 2, 4, followed by 1, 3, 4 as depicted). The model parameters are shared across all time series dimensions. A batch of time series variables $\boldsymbol{z}_t^{\text{bat}}$ contains time series vectors $\boldsymbol{z}_t$ covering time steps from $t-D+1$ to $t$. In contrast to salinas2019high, our approach explicitly models dependencies over the extended temporal window from $t-D+1$ to $t$ during training.
  • Figure 4: (a) Component weights for generating $\boldsymbol{C}_t$ for a batch of time series ($B=8$) from the $\mathtt{m4\_hourly}$ dataset obtained by the GPVar model. Parameters $w_0, w_1, w_2$ represent the component weights of the kernel matrices associated with lengthscales $l=0.5,1.5,2.5$, and $w_3$ is the component weight of the identity matrix. Shaded areas distinguish different days; (b) The autocorrelation function (ACF) indicated by the correlation matrix $\boldsymbol{C}_t$ at 17:00. Given the rapid decay of the ACF, we only plot 12 lags to enhance visualization; (c) The corresponding covariance matrix of the associated target variables $\boldsymbol{\Sigma}_t^{\text{bat}}$ at 17:00. A zoom-in view of a $3B \times 3B$ region is illustrated in the plot, where the diagonal blocks represent $B\times B$ covariance matrices $\boldsymbol{\Sigma}_{t^\prime}$ of $\mathbf{z}_{t^\prime}$ over three consecutive time steps. The off-diagonal blocks describe the cross-covariance $\operatorname{Cov}(\mathbf{z}_{t-\Delta}, \mathbf{z}_{t})$, $\forall \Delta \neq 0$. For visualization clarity, covariance values are clipped to the range $[0,0.03]$.
  • Figure 5: Training loss/validation loss vs training time of the GPVar model. "w/o" denotes methods without time-dependent errors, while "w/" indicates our method.
  • ...and 25 more figures