Table of Contents
Fetching ...

Self-Supervised Pre-Training for Precipitation Post-Processor

Sojung An, Junha Lee, Jiyeon Jang, Inchae Na, Wooyeon Park, Sujeong You

TL;DR

This work addresses the challenge of extending forecast lead times for local precipitation, especially for heavy rainfall, under climate-driven uncertainty. It introduces a self-supervised pre-training scheme on masked 3D atmospheric variables to learn physics-informed latent representations, followed by transfer learning to a precipitation segmentation task. A continuous labeling strategy is proposed to mitigate extreme class imbalance by smoothing probabilistic targets rather than using one-hot labels. Empirical results on regional NWP post-processing show improved heavy-rain detection and localization, outperforming baselines such as Metnet and RDAPS, with practical implications for more reliable short-term precipitation forecasts.

Abstract

Obtaining a sufficient forecast lead time for local precipitation is essential in preventing hazardous weather events. Global warming-induced climate change increases the challenge of accurately predicting severe precipitation events, such as heavy rainfall. In this paper, we propose a deep learning-based precipitation post-processor for numerical weather prediction (NWP) models. The precipitation post-processor consists of (i) employing self-supervised pre-training, where the parameters of the encoder are pre-trained on the reconstruction of the masked variables of the atmospheric physics domain; and (ii) conducting transfer learning on precipitation segmentation tasks (the target domain) from the pre-trained encoder. In addition, we introduced a heuristic labeling approach to effectively train class-imbalanced datasets. Our experiments on precipitation correction for regional NWP show that the proposed method outperforms other approaches.

Self-Supervised Pre-Training for Precipitation Post-Processor

TL;DR

This work addresses the challenge of extending forecast lead times for local precipitation, especially for heavy rainfall, under climate-driven uncertainty. It introduces a self-supervised pre-training scheme on masked 3D atmospheric variables to learn physics-informed latent representations, followed by transfer learning to a precipitation segmentation task. A continuous labeling strategy is proposed to mitigate extreme class imbalance by smoothing probabilistic targets rather than using one-hot labels. Empirical results on regional NWP post-processing show improved heavy-rain detection and localization, outperforming baselines such as Metnet and RDAPS, with practical implications for more reliable short-term precipitation forecasts.

Abstract

Obtaining a sufficient forecast lead time for local precipitation is essential in preventing hazardous weather events. Global warming-induced climate change increases the challenge of accurately predicting severe precipitation events, such as heavy rainfall. In this paper, we propose a deep learning-based precipitation post-processor for numerical weather prediction (NWP) models. The precipitation post-processor consists of (i) employing self-supervised pre-training, where the parameters of the encoder are pre-trained on the reconstruction of the masked variables of the atmospheric physics domain; and (ii) conducting transfer learning on precipitation segmentation tasks (the target domain) from the pre-trained encoder. In addition, we introduced a heuristic labeling approach to effectively train class-imbalanced datasets. Our experiments on precipitation correction for regional NWP show that the proposed method outperforms other approaches.
Paper Structure (8 sections, 2 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 8 sections, 2 equations, 4 figures, 1 table, 1 algorithm.

Figures (4)

  • Figure 1: Process of learning the precipitation post-processor. The model consists of two main phases: 1) pre-training the encoder and decoder using a reconstruction task after masking the inputs and 2) training the decoder for precipitation prediction using the trained encoder. During the main training, the latent vector learned during pre-training is used as an encoder with fixed weights.
  • Figure 2: Visualization of the proportion of labels in the training dataset: (a) is the original data and (b) shows the proportion for pixels with a non-zero probability smoothed using the method proposed in Section \ref{['sec:cl']}.
  • Figure 3: Variable reconstruction results using the pre-trained model on data from August 10, 2022 at 00 UTC. The first row visualizes the normalized variables. The second row visualizes the variables with 90% of the pixels masked. The third row shows the results of reconstructing the masked pixels. For the visualization, the masked values were set to -100, and a range of (-10, 10) was used. The number beside the variable indicates the vertical level.
  • Figure 4: Qualitative comparison between models trained on data from August 10 2022 at 00 UTC. Each result represents a cumulative result over a 1-hour period. Owing to the influence of a stagnant front, rain fell in most parts of Korea; the average rainfall is 100$\sim$200 mm per day, and the maximum exceeds 300 mm.