BlockGPT: Spatio-Temporal Modelling of Rainfall via Frame-Level Autoregression
Cristian Meo, Varun Sarathchandran, Avijit Majhi, Shao Hung, Carlo Saccardi, Ruben Imhoff, Roberto Deidda, Remko Uijlenhoet, Justin Dauwels
TL;DR
BlockGPT rethinks precipitation nowcasting by replacing token-level autoregression with frame-level autoregression, enabling predicting entire precipitation frames in one forward pass. The two-stage pipeline first compresses frames into discrete latent tokens via VQ-GAN and then models temporal dynamics across frames with a transformer that attends bidirectionally within a frame but remains causally structured across time. Empirical results on KNMI and SEVIR show BlockGPT achieving state-of-the-art event localization (CSI, AUC-ROC) and substantial inference speedups (up to 31× faster) relative to token-based and diffusion baselines, with qualitative evidence of better morphologic fidelity. This approach promises improved real-time nowcasting performance for warning systems and provides a scalable backbone for integrating spatial dynamics with rapid inference, with potential extensions to uncertainty quantification and physics-informed constraints.
Abstract
Predicting precipitation maps is a highly complex spatiotemporal modeling task, critical for mitigating the impacts of extreme weather events. Short-term precipitation forecasting, or nowcasting, requires models that are not only accurate but also computationally efficient for real-time applications. Current methods, such as token-based autoregressive models, often suffer from flawed inductive biases and slow inference, while diffusion models can be computationally intensive. To address these limitations, we introduce BlockGPT, a generative autoregressive transformer using batched tokenization (Block) method that predicts full two-dimensional fields (frames) at each time step. Conceived as a model-agnostic paradigm for video prediction, BlockGPT factorizes space-time by using self-attention within each frame and causal attention across frames; in this work, we instantiate it for precipitation nowcasting. We evaluate BlockGPT on two precipitation datasets, viz. KNMI (Netherlands) and SEVIR (U.S.), comparing it to state-of-the-art baselines including token-based (NowcastingGPT) and diffusion-based (DiffCast+Phydnet) models. The results show that BlockGPT achieves superior accuracy, event localization as measured by categorical metrics, and inference speeds up to 31x faster than comparable baselines.
