Rainy: Unlocking Satellite Calibration for Deep Learning in Precipitation
Zhenyu Yu, Hanqing Chen, Mohd Yamani Idna Idris, Pei Wang
TL;DR
The paper tackles the problem of calibrating satellite-based precipitation estimates amid sparse ground truth by introducing the Rainy dataset, a multi-source, long-term spatiotemporal benchmark combining IMERG-Late satellite data with CMPA station measurements. It presents Taper Loss, a distance-weighted objective that emphasizes reliable in-situ observations during satellite calibration, formulated as $\mathcal{L}_{\text{Taper}} = \sum_{j=1}^{N} K(d_j) (\mathbf{I}_1(x_j,y_j) - z_j)^2$ with a normalized variant for balanced contributions. Across five tasks—satellite calibration, event and level prediction, spatiotemporal forecasting, and downscaling—the framework demonstrates that Rainy provides a standardized benchmark while Taper Loss improves calibration and spatiotemporal accuracy, with DiffIR/SwinIR advancing downscaling performance and UNet/MLP variants excelling in specific prediction tasks. This work enables robust AI-for-science applications in QRS, promoting cross-disciplinary collaboration and more reliable precipitation analysis at regional to global scales.
Abstract
Precipitation plays a critical role in the Earth's hydrological cycle, directly affecting ecosystems, agriculture, and water resource management. Accurate precipitation estimation and prediction are crucial for understanding climate dynamics, disaster preparedness, and environmental monitoring. In recent years, artificial intelligence (AI) has gained increasing attention in quantitative remote sensing (QRS), enabling more advanced data analysis and improving precipitation estimation accuracy. Although traditional methods have been widely used for precipitation estimation, they face limitations due to the difficulty of data acquisition and the challenge of capturing complex feature relationships. Furthermore, the lack of standardized multi-source satellite datasets, and in most cases, the exclusive reliance on station data, significantly hinders the effective application of advanced AI models. To address these challenges, we propose the Rainy dataset, a multi-source spatio-temporal dataset that integrates pure satellite data with station data, and propose Taper Loss, designed to fill the gap in tasks where only in-situ data is available without area-wide support. The Rainy dataset supports five main tasks: (1) satellite calibration, (2) precipitation event prediction, (3) precipitation level prediction, (4) spatiotemporal prediction, and (5) precipitation downscaling. For each task, we selected benchmark models and evaluation metrics to provide valuable references for researchers. Using precipitation as an example, the Rainy dataset and Taper Loss demonstrate the seamless collaboration between QRS and computer vision, offering data support for AI for Science in the field of QRS and providing valuable insights for interdisciplinary collaboration and integration.
