Table of Contents
Fetching ...

Self-Supervised Learning with Probabilistic Density Labeling for Rainfall Probability Estimation

Junha Lee, Sojung An, Sujeong You, Namik Cho

TL;DR

This work tackles rainfall probability estimation by post-processing NWP forecasts with a self-supervised framework. It introduces SSLPDL, which leverages masked modeling and a deformable convolution encoder to learn variable dependencies, followed by transfer learning to precipitation segmentation. A key contribution is probabilistic density labeling, which smooths class probabilities near rainfall thresholds to mitigate heavy-rain imbalance. Experiments on the RDAPS dataset show SSLPDL improves spatiotemporal bias correction and extends forecast lead times, highlighting practical gains for regional rainfall prediction and extreme-event awareness.

Abstract

Numerical weather prediction (NWP) models are fundamental in meteorology for simulating and forecasting the behavior of various atmospheric variables. The accuracy of precipitation forecasts and the acquisition of sufficient lead time are crucial for preventing hazardous weather events. However, the performance of NWP models is limited by the nonlinear and unpredictable patterns of extreme weather phenomena driven by temporal dynamics. In this regard, we propose a \textbf{S}elf-\textbf{S}upervised \textbf{L}earning with \textbf{P}robabilistic \textbf{D}ensity \textbf{L}abeling (SSLPDL) for estimating rainfall probability by post-processing NWP forecasts. Our post-processing method uses self-supervised learning (SSL) with masked modeling for reconstructing atmospheric physics variables, enabling the model to learn the dependency between variables. The pre-trained encoder is then utilized in transfer learning to a precipitation segmentation task. Furthermore, we introduce a straightforward labeling approach based on probability density to address the class imbalance in extreme weather phenomena like heavy rain events. Experimental results show that SSLPDL surpasses other precipitation forecasting models in regional precipitation post-processing and demonstrates competitive performance in extending forecast lead times. Our code is available at https://github.com/joonha425/SSLPDL

Self-Supervised Learning with Probabilistic Density Labeling for Rainfall Probability Estimation

TL;DR

This work tackles rainfall probability estimation by post-processing NWP forecasts with a self-supervised framework. It introduces SSLPDL, which leverages masked modeling and a deformable convolution encoder to learn variable dependencies, followed by transfer learning to precipitation segmentation. A key contribution is probabilistic density labeling, which smooths class probabilities near rainfall thresholds to mitigate heavy-rain imbalance. Experiments on the RDAPS dataset show SSLPDL improves spatiotemporal bias correction and extends forecast lead times, highlighting practical gains for regional rainfall prediction and extreme-event awareness.

Abstract

Numerical weather prediction (NWP) models are fundamental in meteorology for simulating and forecasting the behavior of various atmospheric variables. The accuracy of precipitation forecasts and the acquisition of sufficient lead time are crucial for preventing hazardous weather events. However, the performance of NWP models is limited by the nonlinear and unpredictable patterns of extreme weather phenomena driven by temporal dynamics. In this regard, we propose a \textbf{S}elf-\textbf{S}upervised \textbf{L}earning with \textbf{P}robabilistic \textbf{D}ensity \textbf{L}abeling (SSLPDL) for estimating rainfall probability by post-processing NWP forecasts. Our post-processing method uses self-supervised learning (SSL) with masked modeling for reconstructing atmospheric physics variables, enabling the model to learn the dependency between variables. The pre-trained encoder is then utilized in transfer learning to a precipitation segmentation task. Furthermore, we introduce a straightforward labeling approach based on probability density to address the class imbalance in extreme weather phenomena like heavy rain events. Experimental results show that SSLPDL surpasses other precipitation forecasting models in regional precipitation post-processing and demonstrates competitive performance in extending forecast lead times. Our code is available at https://github.com/joonha425/SSLPDL

Paper Structure

This paper contains 15 sections, 4 equations, 5 figures, 6 tables, 1 algorithm.

Figures (5)

  • Figure 1: Our SSLPDL improves NWP forecast accuracy and secures extended lead times. NWP forecasts refer to the predictions of future weather variables (e.g., temperature, humidity, rain) generated by NWP models. The precipitation dataset from the NWP models was extracted for post-processing with a 24-hour lead time, using forecasts from 25 to 30 hours. Our approach consistently improves corrected rainfall across all evaluated lead times, showcasing its robustness and reliability in enhancing forecast accuracy.
  • Figure 2: The overall structure of SSLPDL. Two-stage process representing spatiotemporal bias in forecasts: i) Pre-training and ii) Downstream task. Pre-training focuses on learning a variables-dependency by a reconstruction task. An encoder based on deformable convolution layers wang:2023 is applied to capture and represent the spatial features from neighboring pixels effectively. Note that the deformable convolution aims to aggregate spatial features from the surrounding pixels to predict spatiotemporal bias. The downstream task utilizes the pre-trained encoder and probabilistic density labeling to estimate rainfall probability.
  • Figure 3: Visualization result between benchmarks on August 15 2022 at 18 UTC (+29 h). Colors represent each group (group 1: white, group 2: blue, and group 3: red). As the stationary front moved southward, cold air from the northwest entered the upper atmosphere while recent rainfall left abundant moisture near the surface, raising temperatures. This caused atmospheric instability, resulting in sporadic showers typical of localized precipitation events. Despite regional variations in rainfall intensity and amount, the proposed method accurately predicts concentrated heavy rain. * denotes that probabilistic density labeling is applied.
  • Figure 4: Ablation study on hyperparamter for optimizing probabilistic density labeling. The y-axis represents the mIoU. (a) Gradient $\alpha$ of label smoothing. (b) Ratio $\beta$ of loss function between one-hot labels $y$ and probabilistic density labels $y^{*}$.
  • Figure 5: Monthly analysis of the impact of our SSLPDL on RDAPS.