Table of Contents
Fetching ...

Rainy: Unlocking Satellite Calibration for Deep Learning in Precipitation

Zhenyu Yu, Hanqing Chen, Mohd Yamani Idna Idris, Pei Wang

TL;DR

The paper tackles the problem of calibrating satellite-based precipitation estimates amid sparse ground truth by introducing the Rainy dataset, a multi-source, long-term spatiotemporal benchmark combining IMERG-Late satellite data with CMPA station measurements. It presents Taper Loss, a distance-weighted objective that emphasizes reliable in-situ observations during satellite calibration, formulated as $\mathcal{L}_{\text{Taper}} = \sum_{j=1}^{N} K(d_j) (\mathbf{I}_1(x_j,y_j) - z_j)^2$ with a normalized variant for balanced contributions. Across five tasks—satellite calibration, event and level prediction, spatiotemporal forecasting, and downscaling—the framework demonstrates that Rainy provides a standardized benchmark while Taper Loss improves calibration and spatiotemporal accuracy, with DiffIR/SwinIR advancing downscaling performance and UNet/MLP variants excelling in specific prediction tasks. This work enables robust AI-for-science applications in QRS, promoting cross-disciplinary collaboration and more reliable precipitation analysis at regional to global scales.

Abstract

Precipitation plays a critical role in the Earth's hydrological cycle, directly affecting ecosystems, agriculture, and water resource management. Accurate precipitation estimation and prediction are crucial for understanding climate dynamics, disaster preparedness, and environmental monitoring. In recent years, artificial intelligence (AI) has gained increasing attention in quantitative remote sensing (QRS), enabling more advanced data analysis and improving precipitation estimation accuracy. Although traditional methods have been widely used for precipitation estimation, they face limitations due to the difficulty of data acquisition and the challenge of capturing complex feature relationships. Furthermore, the lack of standardized multi-source satellite datasets, and in most cases, the exclusive reliance on station data, significantly hinders the effective application of advanced AI models. To address these challenges, we propose the Rainy dataset, a multi-source spatio-temporal dataset that integrates pure satellite data with station data, and propose Taper Loss, designed to fill the gap in tasks where only in-situ data is available without area-wide support. The Rainy dataset supports five main tasks: (1) satellite calibration, (2) precipitation event prediction, (3) precipitation level prediction, (4) spatiotemporal prediction, and (5) precipitation downscaling. For each task, we selected benchmark models and evaluation metrics to provide valuable references for researchers. Using precipitation as an example, the Rainy dataset and Taper Loss demonstrate the seamless collaboration between QRS and computer vision, offering data support for AI for Science in the field of QRS and providing valuable insights for interdisciplinary collaboration and integration.

Rainy: Unlocking Satellite Calibration for Deep Learning in Precipitation

TL;DR

The paper tackles the problem of calibrating satellite-based precipitation estimates amid sparse ground truth by introducing the Rainy dataset, a multi-source, long-term spatiotemporal benchmark combining IMERG-Late satellite data with CMPA station measurements. It presents Taper Loss, a distance-weighted objective that emphasizes reliable in-situ observations during satellite calibration, formulated as with a normalized variant for balanced contributions. Across five tasks—satellite calibration, event and level prediction, spatiotemporal forecasting, and downscaling—the framework demonstrates that Rainy provides a standardized benchmark while Taper Loss improves calibration and spatiotemporal accuracy, with DiffIR/SwinIR advancing downscaling performance and UNet/MLP variants excelling in specific prediction tasks. This work enables robust AI-for-science applications in QRS, promoting cross-disciplinary collaboration and more reliable precipitation analysis at regional to global scales.

Abstract

Precipitation plays a critical role in the Earth's hydrological cycle, directly affecting ecosystems, agriculture, and water resource management. Accurate precipitation estimation and prediction are crucial for understanding climate dynamics, disaster preparedness, and environmental monitoring. In recent years, artificial intelligence (AI) has gained increasing attention in quantitative remote sensing (QRS), enabling more advanced data analysis and improving precipitation estimation accuracy. Although traditional methods have been widely used for precipitation estimation, they face limitations due to the difficulty of data acquisition and the challenge of capturing complex feature relationships. Furthermore, the lack of standardized multi-source satellite datasets, and in most cases, the exclusive reliance on station data, significantly hinders the effective application of advanced AI models. To address these challenges, we propose the Rainy dataset, a multi-source spatio-temporal dataset that integrates pure satellite data with station data, and propose Taper Loss, designed to fill the gap in tasks where only in-situ data is available without area-wide support. The Rainy dataset supports five main tasks: (1) satellite calibration, (2) precipitation event prediction, (3) precipitation level prediction, (4) spatiotemporal prediction, and (5) precipitation downscaling. For each task, we selected benchmark models and evaluation metrics to provide valuable references for researchers. Using precipitation as an example, the Rainy dataset and Taper Loss demonstrate the seamless collaboration between QRS and computer vision, offering data support for AI for Science in the field of QRS and providing valuable insights for interdisciplinary collaboration and integration.

Paper Structure

This paper contains 16 sections, 2 theorems, 6 equations, 3 figures, 6 tables.

Key Result

Theorem 4.5

The Taper Loss is defined as a weighted mean squared error, where the kernel function $K(d_j)$ down-weights the influence of interpolated points, focusing on minimizing the error at reliable points: where $K(d_j)$ is the kernel function weighting the error based on the distance $d_j$ to the nearest reliable observation. $(x_j, y_j)$ represents the coordinates of the $j$-th reliable ground observa

Figures (3)

  • Figure 1: Motivation of this study. Satellite data provide wide coverage but lack ground-truth accuracy, while in-situ observations are sparse but reliable. The challenge is to integrate these data sources effectively for precipitation estimation. The proposed Rainy dataset and Taper Loss address this issue by leveraging deep learning to improve satellite calibration in regions with limited ground stations.
  • Figure 2: Overview of dataset Rainy, including CMPA, CMPA station, and Late data. The full data represents the complete precipitation dataset over China (440 $\times$ 700 pixels), while the partial data corresponds to a selected region from the Rainy dataset (256 $\times$ 256 pixels). This dataset is available in both daily and hourly versions, forming a long-term time series (0.1$^\circ$/pixel). The full stations dataset includes all stations across China from 2015 to 2017, though the specific stations corresponding to each data may vary. Notably, in CMPA data, only the stations provide accurate precipitation values. For certain time, CMPA data lacks station coverage, such as on August 3, 2015.
  • Figure 3: Ablation study of the taper loss parameter $\alpha$ for the Hourly Data, using Exponential function as examples, evaluating the impact on RMSE, MAE, and $R^2$. The parameter $\beta$ is set to 1.

Theorems & Definitions (7)

  • Definition 4.1: Satellite and ground data
  • Definition 4.2: Reliable observation stations
  • Definition 4.4: Distance-based kernel function
  • Theorem 4.5: Taper loss function
  • Remark 4.6: Normalization for multiple reliable points
  • Theorem 4.7: Optimization objective
  • Definition 4.8: Total loss function