Table of Contents
Fetching ...

BlockGPT: Spatio-Temporal Modelling of Rainfall via Frame-Level Autoregression

Cristian Meo, Varun Sarathchandran, Avijit Majhi, Shao Hung, Carlo Saccardi, Ruben Imhoff, Roberto Deidda, Remko Uijlenhoet, Justin Dauwels

TL;DR

BlockGPT rethinks precipitation nowcasting by replacing token-level autoregression with frame-level autoregression, enabling predicting entire precipitation frames in one forward pass. The two-stage pipeline first compresses frames into discrete latent tokens via VQ-GAN and then models temporal dynamics across frames with a transformer that attends bidirectionally within a frame but remains causally structured across time. Empirical results on KNMI and SEVIR show BlockGPT achieving state-of-the-art event localization (CSI, AUC-ROC) and substantial inference speedups (up to 31× faster) relative to token-based and diffusion baselines, with qualitative evidence of better morphologic fidelity. This approach promises improved real-time nowcasting performance for warning systems and provides a scalable backbone for integrating spatial dynamics with rapid inference, with potential extensions to uncertainty quantification and physics-informed constraints.

Abstract

Predicting precipitation maps is a highly complex spatiotemporal modeling task, critical for mitigating the impacts of extreme weather events. Short-term precipitation forecasting, or nowcasting, requires models that are not only accurate but also computationally efficient for real-time applications. Current methods, such as token-based autoregressive models, often suffer from flawed inductive biases and slow inference, while diffusion models can be computationally intensive. To address these limitations, we introduce BlockGPT, a generative autoregressive transformer using batched tokenization (Block) method that predicts full two-dimensional fields (frames) at each time step. Conceived as a model-agnostic paradigm for video prediction, BlockGPT factorizes space-time by using self-attention within each frame and causal attention across frames; in this work, we instantiate it for precipitation nowcasting. We evaluate BlockGPT on two precipitation datasets, viz. KNMI (Netherlands) and SEVIR (U.S.), comparing it to state-of-the-art baselines including token-based (NowcastingGPT) and diffusion-based (DiffCast+Phydnet) models. The results show that BlockGPT achieves superior accuracy, event localization as measured by categorical metrics, and inference speeds up to 31x faster than comparable baselines.

BlockGPT: Spatio-Temporal Modelling of Rainfall via Frame-Level Autoregression

TL;DR

BlockGPT rethinks precipitation nowcasting by replacing token-level autoregression with frame-level autoregression, enabling predicting entire precipitation frames in one forward pass. The two-stage pipeline first compresses frames into discrete latent tokens via VQ-GAN and then models temporal dynamics across frames with a transformer that attends bidirectionally within a frame but remains causally structured across time. Empirical results on KNMI and SEVIR show BlockGPT achieving state-of-the-art event localization (CSI, AUC-ROC) and substantial inference speedups (up to 31× faster) relative to token-based and diffusion baselines, with qualitative evidence of better morphologic fidelity. This approach promises improved real-time nowcasting performance for warning systems and provides a scalable backbone for integrating spatial dynamics with rapid inference, with potential extensions to uncertainty quantification and physics-informed constraints.

Abstract

Predicting precipitation maps is a highly complex spatiotemporal modeling task, critical for mitigating the impacts of extreme weather events. Short-term precipitation forecasting, or nowcasting, requires models that are not only accurate but also computationally efficient for real-time applications. Current methods, such as token-based autoregressive models, often suffer from flawed inductive biases and slow inference, while diffusion models can be computationally intensive. To address these limitations, we introduce BlockGPT, a generative autoregressive transformer using batched tokenization (Block) method that predicts full two-dimensional fields (frames) at each time step. Conceived as a model-agnostic paradigm for video prediction, BlockGPT factorizes space-time by using self-attention within each frame and causal attention across frames; in this work, we instantiate it for precipitation nowcasting. We evaluate BlockGPT on two precipitation datasets, viz. KNMI (Netherlands) and SEVIR (U.S.), comparing it to state-of-the-art baselines including token-based (NowcastingGPT) and diffusion-based (DiffCast+Phydnet) models. The results show that BlockGPT achieves superior accuracy, event localization as measured by categorical metrics, and inference speeds up to 31x faster than comparable baselines.

Paper Structure

This paper contains 32 sections, 16 equations, 13 figures, 4 tables.

Figures (13)

  • Figure 1: MSE, PCC, CSI, and FAR of BlockGPT and related baselines, on KNMI and SEVIR datasets. Results are averaged across 3 seeds.
  • Figure 2: Violin plots of event average precipitation in the KNMI dataset.
  • Figure 3: Violin plots of event average precipitation in the SEVIR dataset.
  • Figure 4: KNMI Event 1. Two input frames ($-60$, $0$ min) and four forecasts ($+30$, $+60$, $+120$, $+180$ min). BlockGPT preserves the rainband morphology and advection but modestly overestimates the core intensity at long lead times; baselines miss the shape and location.
  • Figure 5: KNMI Event 2. BlockGPT follows the rapid structural changes and localisation of intense cells across lead times; baselines underperform, particularly for growth and displacement.
  • ...and 8 more figures