Table of Contents
Fetching ...

DeepRV: Accelerating spatiotemporal inference with pre-trained neural priors

Jhonathan Navott, Daniel Jenson, Seth Flaxman, Elizaveta Semenova

TL;DR

DeepRV addresses the cubic scaling of Gaussian processes for spatiotemporal data by learning a decoder-only neural surrogate that maps kernel parameters and a latent draw to GP-like realizations, achieving $O(N^2)$ inference while preserving full probabilistic fidelity. By training to reproduce GP draws and deploying architectures including MLP, gMLP, and transformer with kernel-attention bias, it attains GP-level predictive accuracy and hyperparameter recovery with substantial speedups (up to ~25x) on large datasets. The approach supports non-separable spatiotemporal kernels and city-scale applications (e.g., London LSOA, $n\approx 5{,}000$), functioning as a drop-in GP prior in probabilistic programming frameworks. Ablation shows decoder-only designs and gMLP offer favorable accuracy-efficiency trade-offs, while transformer-based variants extend to variable-location inputs, albeit with higher compute; limitations include pretraining cost and a deterministic emulator assumption, with future work aiming to reduce pretraining time and broaden applicability.

Abstract

Gaussian Processes (GPs) provide a flexible and statistically principled foundation for modelling spatiotemporal phenomena, but their $O(N^3)$ scaling makes them intractable for large datasets. Approximate methods such as variational inference (VI), inducing points (sparse GPs), low-rank factorizations (RFFs), local factorizations and approximations (INLA), improve scalability but trade off accuracy or flexibility. We introduce DeepRV, a neural-network surrogate that closely matches full GP accuracy including hyperparameter estimates, while reducing computational complexity to $O(N^2)$, increasing scalability and inference speed. DeepRV serves as a drop-in replacement for GP prior realisations in e.g. MCMC-based probabilistic programming pipelines, preserving full model flexibility. Across simulated benchmarks, non-separable spatiotemporal GPs, and a real-world application to education deprivation in London (n = 4,994 locations), DeepRV achieves the highest fidelity to exact GPs while substantially accelerating inference. Code is provided in the accompanying ZIP archive, with all experiments run on a single consumer-grade GPU to ensure accessibility for practitioners.

DeepRV: Accelerating spatiotemporal inference with pre-trained neural priors

TL;DR

DeepRV addresses the cubic scaling of Gaussian processes for spatiotemporal data by learning a decoder-only neural surrogate that maps kernel parameters and a latent draw to GP-like realizations, achieving inference while preserving full probabilistic fidelity. By training to reproduce GP draws and deploying architectures including MLP, gMLP, and transformer with kernel-attention bias, it attains GP-level predictive accuracy and hyperparameter recovery with substantial speedups (up to ~25x) on large datasets. The approach supports non-separable spatiotemporal kernels and city-scale applications (e.g., London LSOA, ), functioning as a drop-in GP prior in probabilistic programming frameworks. Ablation shows decoder-only designs and gMLP offer favorable accuracy-efficiency trade-offs, while transformer-based variants extend to variable-location inputs, albeit with higher compute; limitations include pretraining cost and a deterministic emulator assumption, with future work aiming to reduce pretraining time and broaden applicability.

Abstract

Gaussian Processes (GPs) provide a flexible and statistically principled foundation for modelling spatiotemporal phenomena, but their scaling makes them intractable for large datasets. Approximate methods such as variational inference (VI), inducing points (sparse GPs), low-rank factorizations (RFFs), local factorizations and approximations (INLA), improve scalability but trade off accuracy or flexibility. We introduce DeepRV, a neural-network surrogate that closely matches full GP accuracy including hyperparameter estimates, while reducing computational complexity to , increasing scalability and inference speed. DeepRV serves as a drop-in replacement for GP prior realisations in e.g. MCMC-based probabilistic programming pipelines, preserving full model flexibility. Across simulated benchmarks, non-separable spatiotemporal GPs, and a real-world application to education deprivation in London (n = 4,994 locations), DeepRV achieves the highest fidelity to exact GPs while substantially accelerating inference. Code is provided in the accompanying ZIP archive, with all experiments run on a single consumer-grade GPU to ensure accessibility for practitioners.

Paper Structure

This paper contains 40 sections, 13 equations, 11 figures, 9 tables.

Figures (11)

  • Figure 1: DeepRV predictive evaluation on the London LSOA education deprivation dataset (= 4,994 locations). Panels show (from left to right): observed $\mathbf{y}$ (masked), full true $\mathbf{y}$, DeepRV posterior predictive mean $\hat{\mathbf{y}}$, and DeepRV posterior predictive uncertainty (standard deviation).
  • Figure 2: Left panel details the data generating process used for pre-training. The middle panel shows the input and output of DeepRV during pre-training. In the right panel are two statistical models, the first representing a traditional model that uses a GP prior and the second one that swaps DeepRV for the GP.
  • Figure 3: Matérn-1/2 benchmarking results: (a) Posterior predictive MSE relative to full GP MCMC; (b) Wasserstein distance between inferred and full GP MCMC lengthscale posteriors. Results are averaged across true lengthscales and grid sizes over 15 runs, with 10% and 90% quantiles reported.
  • Figure 4: Spatiotemporal GP inferred hyperparameter posterior distributions. DeepRV closely matches GP on all hyperparameter posterior distributions.
  • Figure 5: Predicted prevalence at 100 randomly selected MSOAs.
  • ...and 6 more figures