Prequential posteriors
Shreya Sinha-Roy, Richard G. Everitt, Christian P. Robert, Ritabrata Dutta
TL;DR
This work introduces prequential posteriors as a likelihood-free Bayesian framework for data assimilation in deep generative forecasting models, addressing the challenge of intractable likelihoods under temporal dependencies and model misspecification. By adopting a predictive-sequential loss and a Bernstein–von Mises-type analysis, the authors establish predictive consistency and posterior concentration around predictive-optimal parameters, even when the true data-generating process lies outside the model class. They implement a scalable wastefree Sequential Monte Carlo scheme with a preconditioned forward kernel to efficiently explore high-dimensional parameter spaces typical of DGFMs. The approach is validated on synthetic Lorenz-96 dynamics and real-world WeatherBench data, showing improved calibration, forecast accuracy, and reliability over misspecified baselines, with practical implications for data assimilation in complex dynamical systems.
Abstract
Data assimilation is a fundamental task in updating forecasting models upon observing new data, with applications ranging from weather prediction to online reinforcement learning. Deep generative forecasting models (DGFMs) have shown excellent performance in these areas, but assimilating data into such models is challenging due to their intractable likelihood functions. This limitation restricts the use of standard Bayesian data assimilation methodologies for DGFMs. To overcome this, we introduce prequential posteriors, based upon a predictive-sequential (prequential) loss function; an approach naturally suited for temporally dependent data which is the focus of forecasting tasks. Since the true data-generating process often lies outside the assumed model class, we adopt an alternative notion of consistency and prove that, under mild conditions, both the prequential loss minimizer and the prequential posterior concentrate around parameters with optimal predictive performance. For scalable inference, we employ easily parallelizable wastefree sequential Monte Carlo (SMC) samplers with preconditioned gradient-based kernels, enabling efficient exploration of high-dimensional parameter spaces such as those in DGFMs. We validate our method on both a synthetic multi-dimensional time series and a real-world meteorological dataset; highlighting its practical utility for data assimilation for complex dynamical systems.
