Table of Contents
Fetching ...

Learning Data-driven Surrogate and Correction Models for Satellite Observations in Numerical Weather Prediction

Gian Luca Buono, Stefanie Hollborn, Roland Potthast, Jörg Schäfer, Martin Simon

Abstract

Satellite observations play a critical role in numerical weather prediction where they are assimilated through an observation operator that maps model states to radiances. In the traditional Ensemble Kalman Filter, these observations are used to update the state by weighting their associated errors against model uncertainties to produce an optimal estimate. This process requires radiative transfer simulations for passive, downward-viewing satellite radiometers operating in the visible, infrared, and microwave spectra. Typically, such simulations rely on numerically integrating physical laws via models like RTTOV. In this paper, we introduce two machine learning surrogate observation operators inspired by modern computer-vision architectures: First, a fully data-driven emulator of radiative transfer, and second, a hybrid incremental correction model that learns only the residual relative to RTTOV, thereby retaining established physics while enabling data-driven refinement in complex conditions such as cloud-affected situations. The residual formulation improves radiance accuracy (lower Root Mean Squared Error (RMSE) than the fully data-driven emulator and RTTOV) and adds only moderate computational costs to the assimilation step. Both models combine 3D convolutions for vertical profile encoding with a 2D U-Net operating on latitude-longitude grids, allowing joint learning of vertical structure, spatial correlations, and inter-channel dependencies. We further provide a theoretical justification for deploying the hybrid surrogate as an observation operator in data assimilation.

Learning Data-driven Surrogate and Correction Models for Satellite Observations in Numerical Weather Prediction

Abstract

Satellite observations play a critical role in numerical weather prediction where they are assimilated through an observation operator that maps model states to radiances. In the traditional Ensemble Kalman Filter, these observations are used to update the state by weighting their associated errors against model uncertainties to produce an optimal estimate. This process requires radiative transfer simulations for passive, downward-viewing satellite radiometers operating in the visible, infrared, and microwave spectra. Typically, such simulations rely on numerically integrating physical laws via models like RTTOV. In this paper, we introduce two machine learning surrogate observation operators inspired by modern computer-vision architectures: First, a fully data-driven emulator of radiative transfer, and second, a hybrid incremental correction model that learns only the residual relative to RTTOV, thereby retaining established physics while enabling data-driven refinement in complex conditions such as cloud-affected situations. The residual formulation improves radiance accuracy (lower Root Mean Squared Error (RMSE) than the fully data-driven emulator and RTTOV) and adds only moderate computational costs to the assimilation step. Both models combine 3D convolutions for vertical profile encoding with a 2D U-Net operating on latitude-longitude grids, allowing joint learning of vertical structure, spatial correlations, and inter-channel dependencies. We further provide a theoretical justification for deploying the hybrid surrogate as an observation operator in data assimilation.
Paper Structure (37 sections, 1 theorem, 29 equations, 9 figures, 10 tables)

This paper contains 37 sections, 1 theorem, 29 equations, 9 figures, 10 tables.

Key Result

Proposition 2.1

Assume the observation model and define the operator mismatch relative to some baseline operator $\mathcal{H}$ Let $\mathcal{G}$ be a hypothesis class and define the best-in-class direct and residual predictors Define the approximation errors Then

Figures (9)

  • Figure 1: Channel correlations of observed reflectance (obs_rad)
  • Figure 2: Our U-Net-based implementation including a convolutional autoencoder.
  • Figure 3: Training (orange) and validation (blue) loss (RMSE)
  • Figure 4: Model Comparison of $\mathcal{H}_{\text{RTTOV}}$, $\mathcal{H}_{\text{ML-{inc}}}$, and $\mathcal{H}_{\text{ML-{full}}}$ versus satellite observation on sample 15, channel 5 from our testing dataset.
  • Figure 5: Model Comparison of $\mathcal{H}_{\text{RTTOV}}$, $\mathcal{H}_{\text{ML-{inc}}}$, and $\mathcal{H}_{\text{ML-{full}}}$ against satellite observation for sample 15, channel 3 from our unseen testing dataset.
  • ...and 4 more figures

Theorems & Definitions (6)

  • Remark 2.1
  • Remark 2.2
  • Proposition 2.1
  • proof
  • Remark 2.3
  • Remark 4.1