Table of Contents
Fetching ...

Controllable Probabilistic Forecasting with Stochastic Decomposition Layers

John S. Schreck, William E. Chapman, Charlie Becker, David John Gagne, Dhamma Kimpara, Nihanth Cherukuru, Judith Berner, Kirsten J. Mayer, Negin Sobhani

TL;DR

SDL introduces Stochastic Decomposition Layers to convert deterministic weather models into calibrated probabilistic ensembles with hierarchical, scale-aware perturbations. Built on WXFormer, the approach uses latent-driven style, channel modulation, and per-pixel noise, trained via CRPS to achieve competitive skill with low computational overhead and enable post-inference spread control through latent rescaling. The method supports reproducible ensemble generation and interpretable, multi-scale uncertainty decomposition, demonstrated on ERA5 data with favorable calibration metrics. Limitations include dependence on ERA5-derived uncertainty bounds and vertical resolution constraints, suggesting future work on out-of-distribution robustness and integration with physics-based stochastic parameterization.

Abstract

AI weather prediction ensembles with latent noise injection and optimized with the continuous ranked probability score (CRPS) have produced both accurate and well-calibrated predictions with far less computational cost compared with diffusion-based methods. However, current CRPS ensemble approaches vary in their training strategies and noise injection mechanisms, with most injecting noise globally throughout the network via conditional normalization. This structure increases training expense and limits the physical interpretability of the stochastic perturbations. We introduce Stochastic Decomposition Layers (SDL) for converting deterministic machine learning weather models into probabilistic ensemble systems. Adapted from StyleGAN's hierarchical noise injection, SDL applies learned perturbations at three decoder scales through latent-driven modulation, per-pixel noise, and channel scaling. When applied to WXFormer via transfer learning, SDL requires less than 2\% of the computational cost needed to train the baseline model. Each ensemble member is generated from a compact latent tensor (5 MB), enabling perfect reproducibility and post-inference spread adjustment through latent rescaling. Evaluation on 2022 ERA5 reanalysis shows ensembles with spread-skill ratios approaching unity and rank histograms that progressively flatten toward uniformity through medium-range forecasts, achieving calibration competitive with operational IFS-ENS. Multi-scale experiments reveal hierarchical uncertainty: coarse layers modulate synoptic patterns while fine layers control mesoscale variability. The explicit latent parameterization provides interpretable uncertainty quantification for operational forecasting and climate applications.

Controllable Probabilistic Forecasting with Stochastic Decomposition Layers

TL;DR

SDL introduces Stochastic Decomposition Layers to convert deterministic weather models into calibrated probabilistic ensembles with hierarchical, scale-aware perturbations. Built on WXFormer, the approach uses latent-driven style, channel modulation, and per-pixel noise, trained via CRPS to achieve competitive skill with low computational overhead and enable post-inference spread control through latent rescaling. The method supports reproducible ensemble generation and interpretable, multi-scale uncertainty decomposition, demonstrated on ERA5 data with favorable calibration metrics. Limitations include dependence on ERA5-derived uncertainty bounds and vertical resolution constraints, suggesting future work on out-of-distribution robustness and integration with physics-based stochastic parameterization.

Abstract

AI weather prediction ensembles with latent noise injection and optimized with the continuous ranked probability score (CRPS) have produced both accurate and well-calibrated predictions with far less computational cost compared with diffusion-based methods. However, current CRPS ensemble approaches vary in their training strategies and noise injection mechanisms, with most injecting noise globally throughout the network via conditional normalization. This structure increases training expense and limits the physical interpretability of the stochastic perturbations. We introduce Stochastic Decomposition Layers (SDL) for converting deterministic machine learning weather models into probabilistic ensemble systems. Adapted from StyleGAN's hierarchical noise injection, SDL applies learned perturbations at three decoder scales through latent-driven modulation, per-pixel noise, and channel scaling. When applied to WXFormer via transfer learning, SDL requires less than 2\% of the computational cost needed to train the baseline model. Each ensemble member is generated from a compact latent tensor (5 MB), enabling perfect reproducibility and post-inference spread adjustment through latent rescaling. Evaluation on 2022 ERA5 reanalysis shows ensembles with spread-skill ratios approaching unity and rank histograms that progressively flatten toward uniformity through medium-range forecasts, achieving calibration competitive with operational IFS-ENS. Multi-scale experiments reveal hierarchical uncertainty: coarse layers modulate synoptic patterns while fine layers control mesoscale variability. The explicit latent parameterization provides interpretable uncertainty quantification for operational forecasting and climate applications.

Paper Structure

This paper contains 29 sections, 3 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Schematic of WXFormer with SDL injection points. Independent latent vectors $Z_1$, $Z_2$, and $Z_3$ are sampled from $\mathcal{N}(0,\mathbf{I})$ and fed to each of the three decoder SDL layers (yellow boxes) after upsampling blocks and before skip connection concatenation. Diamonds mark optional encoder SDL injection points not used in this work. Each SDL operates with independent learnable parameters to accommodate feature magnitude variations across network depth. (b) Data flow through the SDL. Latent vector $Z_\ell$ is transformed via a learned linear layer (red) to produce tensor $S$. Three independent pathways (learned channel modulation at the top [$\mathbf{M}$], latent "style" tensor in the middle [$\mathbf{S}$], and per pixel noise generation at the bottom [$\mathbf{R}$]) are combined multiplicatively before the residual addition to the input features. Tensor dimensions are shown for clarity; spatial broadcasting expands $\mathbf{S}$ and $\mathbf{M}$ to match $\mathbf{R}$
  • Figure 2: Forecast verification metrics for selected atmospheric variables at key pressure levels over the 2022 ERA5 test period. Columns show geopotential height at 500 hPa (Z500), zonal wind at 700 hPa (U700), meridional wind at 850 hPa (V850), temperature at 850 hPa (T850), and specific humidity at 850 hPa (Q850). Rows display: (1) Deterministic Root Mean Square Error (RMSE) comparing the baseline deterministic WXFormer (solid line, 15-day forecasts) against IFS HRES (dashed line, 10-day forecasts); (2) Ensemble mean RMSE comparing SDL-WXFormer (solid line) to IFS-ENS (dashed line); (3) Continuous Ranked Probability Score (CRPS), where lower values indicate better probabilistic skill; (4) Spread-Skill Ratio (SSR), where values near 1.0 indicate well-calibrated ensemble spread; (5) SDL-WXFormer rank histogram frequency showing distribution of verification values among ranked ensemble members; (6) IFS-ENS rank histogram frequency for comparison. The two RMSE rows share the same $y$-axis range within each column for direct comparison between deterministic and ensemble performance. Colors in rank histograms denote forecast lead times: 6 h (purple), 5 days (blue), 10 days (orange), 15 days (green).
  • Figure 3: Vertical structure of ensemble forecast skill for core atmospheric variables across all 16 model levels. Columns show zonal wind (U), meridional wind (V), temperature (T), and specific humidity (Q). Rows display: (1-2) CRPS and RMSE as functions of forecast lead time (x-axis) and model level (colorbar, with lighter colors indicating upper atmosphere); (3) Spread-Skill Ratio (SSR) versus lead time, with dashed horizontal line at 1.0 indicating perfect calibration; (4-7) Rank histograms at 6h, 5d, 10d, and 15d lead times across ensemble member ranks (x-axis). Each rank histogram curve corresponds to a different model level (colored by vertical position). SDL-WXFormer (solid) and IFS-ENS (dashed) shown for SSR comparison. Model levels are hybrid $\sigma$--pressure coordinates; approximate pressure equivalents are provided for reference (e.g., level 90 $\approx$ 500 hPa, level 110 $\approx$ 850 hPa).
  • Figure 4: Ensemble forecast evolution for Winter Storm Izzy valid January 17, 2022 at 18:00 UTC. Each row represents a different forecast initialization time (2-10 days before the event). For each lead time, five ensemble members are randomly shown. Contours show MSLP at 10 hPa intervals.
  • Figure 5: Original ensemble member (left), regenerated forecast using stored latent tensor Z (center), and difference field (right) demonstrate exact reproduction and post-inference manipulation capability. The difference field shows maximum absolute error below 0.01 hPa, confirming that storing the latent tensor enables perfect regeneration of any ensemble member.
  • ...and 5 more figures