Mixed moving average field guided learning for spatio-temporal data
Imma Valentina Curato, Orkun Furat, Lorenzo Proietti, Bennet Stroeh
TL;DR
This work addresses forecasting spatio-temporal data when the predictive distribution is unknown by grounding learning in influenced mixed moving average fields (MMAFs). It introduces a theory-guided ML framework, MMAF-guided learning, that uses a spatio-temporal embedding and a generalized Bayesian (randomized) estimator over Lipschitz predictors to produce ensemble, one-time ahead forecasts with a causal interpretation. The authors establish fixed-time and any-time PAC Bayesian bounds for data generated by $\theta$-lex weakly dependent MMAFs, and they derive practical embedding strategies and estimator designs (including a randomized Gibbs estimator) to optimize generalization performance. Validation on simulated spatio-temporal Ornstein-Uhlenbeck processes demonstrates that the resulting ensemble forecasts yield narrow interquartile ranges that contain the true test values and offer a transparent, causally interpretable forecasting framework for raster data cubes. The approach provides a principled path to non-vacuous generalization guarantees in dependent spatio-temporal settings and offers guidelines for selecting embeddings and hyperparameters to balance dependence structure and learning efficiency. Overall, MMAF-guided learning contributes a theoretically grounded, causality-aware methodology for interpretable spatio-temporal forecasting with explicit uncertainty quantification.
Abstract
Influenced mixed moving average fields are a versatile modeling class for spatio-temporal data. However, their predictive distribution is not generally known. Under this modeling assumption, we define a novel spatio-temporal embedding and a theory-guided machine learning approach that employs a generalized Bayesian algorithm to make ensemble forecasts. We use Lipschitz predictors and determine fixed-time and any-time PAC Bayesian bounds in the batch learning setting. Performing causal forecast is a highlight of our methodology as its potential application to data with spatial and temporal short and long-range dependence. We then test the performance of our learning methodology by using linear predictors and data sets simulated from a spatio-temporal Ornstein-Uhlenbeck process.
