Table of Contents
Fetching ...

A Space-Time Transformer for Precipitation Forecasting

Levi Harris, Tianlong Chen

TL;DR

This work tackles the challenge of nowcasting extreme precipitation with scalable AI-driven methods by introducing SaTformer, a full space-time attention transformer operating on patched HRIT satellite radiances. It reframes precipitation regression as a multi-class classification problem and uses a class-weighted cross-entropy loss to address severe class imbalance, enabling robust performance across both common and extreme rainfall events. Empirically, SaTformer achieves first place on the NeurIPS Weather4Cast 2025 Cumulative Rainfall task, with ablations demonstrating the advantages of 3D space-time attention and an appropriate binning strategy. The results suggest that end-to-end transformers with comprehensive spatiotemporal attention, coupled with careful task formulation, can offer strong, scalable precipitation nowcasting capabilities and may generalize to other low-token, space-time prediction problems.

Abstract

Meteorological agencies around the world rely on real-time flood guidance to issue live-saving advisories and warnings. For decades traditional numerical weather prediction (NWP) models have been state-of-the-art for precipitation forecasting. However, physically-parameterized models suffer from a few core limitations: first, solving PDEs to resolve atmospheric dynamics is computationally demanding, and second, these methods degrade in performance at nowcasting timescales (i.e., 0-4 hour lead-times). Motivated by these shortcomings, recent work proposes AI-weather prediction (AI-WP) alternatives that learn to emulate analysis data with neural networks. While these data-driven approaches have enjoyed enormous success across diverse spatial and temporal resolutions, applications of video-understanding architectures for weather forecasting remain underexplored. To address these gaps, we propose SaTformer: a video transformer built on full space-time attention that skillfully forecasts extreme precipitation from satellite radiances. Along with our novel architecture, we introduce techniques to tame long-tailed precipitation datasets. Namely, we reformulate precipitation regression into a classification problem, and employ a class-weighted loss to address label imbalances. Our model scored first place on the NeurIPS Weather4Cast 2025 Cumulative Rainfall challenge. Code and model weights are available: https://github.com/leharris3/satformer

A Space-Time Transformer for Precipitation Forecasting

TL;DR

This work tackles the challenge of nowcasting extreme precipitation with scalable AI-driven methods by introducing SaTformer, a full space-time attention transformer operating on patched HRIT satellite radiances. It reframes precipitation regression as a multi-class classification problem and uses a class-weighted cross-entropy loss to address severe class imbalance, enabling robust performance across both common and extreme rainfall events. Empirically, SaTformer achieves first place on the NeurIPS Weather4Cast 2025 Cumulative Rainfall task, with ablations demonstrating the advantages of 3D space-time attention and an appropriate binning strategy. The results suggest that end-to-end transformers with comprehensive spatiotemporal attention, coupled with careful task formulation, can offer strong, scalable precipitation nowcasting capabilities and may generalize to other low-token, space-time prediction problems.

Abstract

Meteorological agencies around the world rely on real-time flood guidance to issue live-saving advisories and warnings. For decades traditional numerical weather prediction (NWP) models have been state-of-the-art for precipitation forecasting. However, physically-parameterized models suffer from a few core limitations: first, solving PDEs to resolve atmospheric dynamics is computationally demanding, and second, these methods degrade in performance at nowcasting timescales (i.e., 0-4 hour lead-times). Motivated by these shortcomings, recent work proposes AI-weather prediction (AI-WP) alternatives that learn to emulate analysis data with neural networks. While these data-driven approaches have enjoyed enormous success across diverse spatial and temporal resolutions, applications of video-understanding architectures for weather forecasting remain underexplored. To address these gaps, we propose SaTformer: a video transformer built on full space-time attention that skillfully forecasts extreme precipitation from satellite radiances. Along with our novel architecture, we introduce techniques to tame long-tailed precipitation datasets. Namely, we reformulate precipitation regression into a classification problem, and employ a class-weighted loss to address label imbalances. Our model scored first place on the NeurIPS Weather4Cast 2025 Cumulative Rainfall challenge. Code and model weights are available: https://github.com/leharris3/satformer

Paper Structure

This paper contains 23 sections, 12 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: The SaTformer architecture. (Left) We partition each frame $t \in (1, ...,T)$ in a sequence of satellite radiances into $N$ non-overlapping patches. Each patch is projected into a token representation, and a class token (CLS) is prepended to the token sequence. (Center) Model encoder design. Token sequences pass through (L) transformer layers. The class token is spliced from the output of the final transformer layer and passed to a single-layer prediction head. (Right) Full 3D attention; all tokens attend to all tokens over time and space.
  • Figure 2: Empirical distribution of target labels within our training set. Note the log scale on the y-axis; targets for our task are skewed heavily towards low/no rain events.