Scalable Dynamic Mixture Model with Full Covariance for Probabilistic Traffic Forecasting
Seongjin Choi, Nicolas Saunier, Vincent Zhihao Zheng, Martin Trepanier, Lijun Sun
TL;DR
This work addresses the challenge of non-stationary, spatiotemporally correlated forecasting errors in traffic speed prediction by modeling the error distribution as a dynamic mixture of zero-mean matrix-variate Gaussians. Each component uses a Kronecker-structured covariance, Σ^k = Σ^k_Q ⊗ Σ^k_N, with time-varying weights ω_t^k = f_ω(X_t) and precision-based Cholesky parameterizations to enable scalable training. The model optimizes a hybrid loss, L_DynMix = (1−ρ)L_MSE + ρL_NLL, integrating both mean prediction and probabilistic uncertainty, and demonstrates improved RMSE, MAPE, and MAE on PEMS-BAY and METR-LA datasets, with interpretable spatiotemporal patterns across mixture components. The approach acts as an add-on to existing deep traffic models, offering a principled way to capture multimodal and dynamic error structures, and has potential extensions to non-Gaussian components, relaxed covariance structures, and graph-informed precision matrices for further performance gains.
Abstract
Deep learning-based multivariate and multistep-ahead traffic forecasting models are typically trained with the mean squared error (MSE) or mean absolute error (MAE) as the loss function in a sequence-to-sequence setting, simply assuming that the errors follow an independent and isotropic Gaussian or Laplacian distributions. However, such assumptions are often unrealistic for real-world traffic forecasting tasks, where the probabilistic distribution of spatiotemporal forecasting is very complex with strong concurrent correlations across both sensors and forecasting horizons in a time-varying manner. In this paper, we model the time-varying distribution for the matrix-variate error process as a dynamic mixture of zero-mean Gaussian distributions. To achieve efficiency, flexibility, and scalability, we parameterize each mixture component using a matrix normal distribution and allow the mixture weight to change and be predictable over time. The proposed method can be seamlessly integrated into existing deep-learning frameworks with only a few additional parameters to be learned. We evaluate the performance of the proposed method on a traffic speed forecasting task and find that our method not only improves model performance but also provides interpretable spatiotemporal correlation structures.
