DINOv3 as a Frozen Encoder for CRPS-Oriented Probabilistic Rainfall Nowcasting
Luciano Araujo Dourado Filho, Almir Moreira da Silva Neto, Anthony Miyaguchi, Rodrigo Pereira David, Rodrigo Tripodi Calumby, Lukáš Picek
TL;DR
The paper tackles calibrated probabilistic nowcasting of four-hour rainfall from satellite data by combining a frozen DINOv3 satellite encoder with a lightweight V-JEPA projector to produce a discrete eCDF over rainfall bins, optimized with a discrete $L_{RPS}$ objective. It benchmarks this approach against compact 3D-UNET baselines with both eCDF and Gamma-Hurdle probabilistic heads on the Weather4Cast 2025 dataset, highlighting the advantages of a frozen backbone and fine-grained discretization. The key finding is that the DINOv3–V-JEPA architecture achieves the best CRPS of $3.5102$, while UNET-based models lag behind, with the per-pixel Gamma-Hurdle head performing best among UNET variants. The results support the practicality of leveraging pretrained world-model encoders with small probabilistic heads for efficient, calibrated nowcasting under sparsity and domain shifts, albeit with sensitivity to binning choices and training dynamics.
Abstract
This paper proposes a competitive and computationally efficient approach to probabilistic rainfall nowcasting. A video projector (V-JEPA Vision Transformer) associated to a lightweight probabilistic head is attached to a pre-trained satellite vision encoder (DINOv3-SAT493M) to map encoder tokens into a discrete empirical CDF (eCDF) over 4-hour accumulated rainfall. The projector-head is optimized end-to-end over the Ranked Probability Score (RPS). As an alternative, 3D-UNET baselines trained with an aggregate Rank Probability Score and a per-pixel Gamma-Hurdle objective are used. On the Weather4Cast 2025 benchmark, the proposed method achieved a promising performance, with a CRPS of 3.5102, which represents $\approx$ 26% in effectiveness gain against the best 3D-UNET.
