PA-Net: Precipitation-Adaptive Mixture-of-Experts for Long-Tail Rainfall Nowcasting

Xinyu Xiao; Sen Lei; Eryun Liu; Shiming Xiang; Hao Li; Cheng Yuan; Yuan Qi; Qizhao Jin

PA-Net: Precipitation-Adaptive Mixture-of-Experts for Long-Tail Rainfall Nowcasting

Xinyu Xiao, Sen Lei, Eryun Liu, Shiming Xiang, Hao Li, Cheng Yuan, Yuan Qi, Qizhao Jin

Abstract

Precipitation nowcasting is vital for flood warning, agricultural management, and emergency response, yet two bottlenecks persist: the prohibitive cost of modeling million-scale spatiotemporal tokens from multi-variate atmospheric fields, and the extreme long-tailed rainfall distribution where heavy-to-torrential events -- those of greatest societal impact -- constitute fewer than 0.1% of all samples. We propose the Precipitation-Adaptive Network (PA-Net), a Transformer framework whose computational budget is explicitly governed by rainfall intensity. Its core component, Precipitation-Adaptive MoE (PA-MoE), dynamically scales the number of activated experts per token according to local precipitation magnitude, channeling richer representational capacity toward the rare yet critical heavy-rainfall tail. A Dual-Axis Compressed Latent Attention mechanism factorizes spatiotemporal attention with convolutional reduction to manage massive context lengths, while an intensity-aware training protocol progressively amplifies learning signals from extreme-rainfall samples. Experiment on ERA5 demonstrate consistent improvements over state-of-the-art baselines, with particularly significant gains in heavy-rain and rainstorm regimes.

PA-Net: Precipitation-Adaptive Mixture-of-Experts for Long-Tail Rainfall Nowcasting

Abstract

Paper Structure (30 sections, 28 equations, 3 figures, 5 tables)

This paper contains 30 sections, 28 equations, 3 figures, 5 tables.

Introduction
Related Work
Data-Driven Precipitation Nowcasting
Tokenization in Deep Learning
Background
Task Definition
Open Difficulties in Precipitation Nowcasting
Preliminaries
Transformer Architecture
Mixture-of-Experts Paradigm
METHODOLOGY
PA-Net Backbone Architecture
Patch Tokenization
Solar-Geometric Positional Encoding
Dual-Axis Compressed Latent Attention
...and 15 more sections

Figures (3)

Figure 1: Architecture of the Precipitation-Adaptive Network (PA-Net): a Transformer backbone equipped with Dual-Axis Compressed Latent Attention (DACLA) for scalable spatiotemporal modeling and Precipitation-Adaptive MoE (PA-MoE) for rainfall-intensity-driven expert allocation, jointly enabling enhanced nowcasting skill from light drizzle through torrential downpours.
Figure 2: Overview of the Precipitation-Adaptive MoE (PA-MoE): tokens associated with intense rainfall activate a larger expert ensemble, while rainless and light rain are routed to fewer experts, thereby concentrating representational capacity on the rare yet high-impact tail of the precipitation distribution.
Figure 3: Six-hour precipitation forecast visualization on the ERA5 dataset. Columns from up to down: PA-Net and ground truth. Orange colors indicate heavier rainfall.

PA-Net: Precipitation-Adaptive Mixture-of-Experts for Long-Tail Rainfall Nowcasting

Abstract

PA-Net: Precipitation-Adaptive Mixture-of-Experts for Long-Tail Rainfall Nowcasting

Authors

Abstract

Table of Contents

Figures (3)