Huge Ensembles Part I: Design of Ensemble Weather Forecasts using Spherical Fourier Neural Operators

Ankur Mahesh; William Collins; Boris Bonev; Noah Brenowitz; Yair Cohen; Joshua Elms; Peter Harrington; Karthik Kashinath; Thorsten Kurth; Joshua North; Travis OBrien; Michael Pritchard; David Pruitt; Mark Risser; Shashank Subramanian; Jared Willard

Huge Ensembles Part I: Design of Ensemble Weather Forecasts using Spherical Fourier Neural Operators

Ankur Mahesh, William Collins, Boris Bonev, Noah Brenowitz, Yair Cohen, Joshua Elms, Peter Harrington, Karthik Kashinath, Thorsten Kurth, Joshua North, Travis OBrien, Michael Pritchard, David Pruitt, Mark Risser, Shashank Subramanian, Jared Willard

TL;DR

The paper tackles the challenge of sampling low-probability high-impact weather events by proposing a huge ensemble built from an SFNO-based emulator. It designs SFNO-BVMC using bred vectors for initial-condition perturbations and multiple SFNO checkpoints for model perturbations, enabling a calibrated probabilistic forecast with many members at a fraction of the cost of physics-based models. Through mean, spectral, and extreme diagnostics, the authors demonstrate competitive performance with the ECMWF IFS ensemble and validate the approach via Extreme Forecast Index and tail-focused metrics, while highlighting spectral fidelity and the need for continued improvements. Part II expands to a 7,424-member ensemble to probe LLHIs in greater depth, underscoring the practical potential of ML-based ensembles for climate risk assessment and decision support.

Abstract

Studying low-likelihood high-impact extreme weather events in a warming world is a significant and challenging task for current ensemble forecasting systems. While these systems presently use up to 100 members, larger ensembles could enrich the sampling of internal variability. They may capture the long tails associated with climate hazards better than traditional ensemble sizes. Due to computational constraints, it is infeasible to generate huge ensembles (comprised of 1,000-10,000 members) with traditional, physics-based numerical models. In this two-part paper, we replace traditional numerical simulations with machine learning (ML) to generate hindcasts of huge ensembles. In Part I, we construct an ensemble weather forecasting system based on Spherical Fourier Neural Operators (SFNO), and we discuss important design decisions for constructing such an ensemble. The ensemble represents model uncertainty through perturbed-parameter techniques, and it represents initial condition uncertainty through bred vectors, which sample the fastest growing modes of the forecast. Using the European Centre for Medium-Range Weather Forecasts Integrated Forecasting System (IFS) as a baseline, we develop an evaluation pipeline composed of mean, spectral, and extreme diagnostics. Using large-scale, distributed SFNOs with 1.1 billion learned parameters, we achieve calibrated probabilistic forecasts. As the trajectories of the individual members diverge, the ML ensemble mean spectra degrade with lead time, consistent with physical expectations. However, the individual ensemble members' spectra stay constant with lead time. Therefore, these members simulate realistic weather states, and the ML ensemble thus passes a crucial spectral test in the literature. The IFS and ML ensembles have similar Extreme Forecast Indices, and we show that the ML extreme weather forecasts are reliable and discriminating.

Huge Ensembles Part I: Design of Ensemble Weather Forecasts using Spherical Fourier Neural Operators

TL;DR

Abstract

Paper Structure (19 sections, 10 equations, 24 figures, 2 tables)

This paper contains 19 sections, 10 equations, 24 figures, 2 tables.

Introduction
Designing ensembles with SFNO
Selecting an emulator
Selecting a number of checkpoints for the ensemble
Bred vectors with SFNO
Contributions of bred vectors and multiple checkpoints to the ensemble calibration
Ensemble Diagnostics
Mean Diagnostics
Spectral Diagnostics
Extreme Diagnostics
Extreme Forecast Index
Reliability and Discrimination
Threshold-weighted Continuous Ranked Probability Score
Discussion and Conclusion
Case Study: 2023 Phoenix Heatwave
...and 4 more sections

Figures (24)

Figure 1: Overview of ensemble architecture. The ensemble is constructed using two methods: initial condition perturbations and model perturbations. The initial condition perturbations are generated using bred vectors, to sample the fastest growing errors in the initial condition. Model perturbations consist of twenty-nine instances of the SFNO model trained independently from scratch. Bred vectors are generated separately for each SFNO checkpoint. Each bred vector creates two initial condition perturbations: one with the bred vector added to the initial condition, and one with the bred vector subtracted from the initial condition. For the small ensemble, we use $N = 1$ bred vectors per checkpoint. For the huge ensemble in Part II, there are $N = 128$ bred vectors per checkpoint.
Figure 2: Comparing different versions of SFNO. (a) The 850 hPa temperature spread-error ratios are compared for lagged ensembles. A lagged ensemble is created by using nine adjacent time steps as initial conditions, and the spread-error is shown for each SFNO configurations. (b) Relative power spectra at a lead time of 360 hours (colored lines) for 850 hPa temperatures for a large SFNO (with a scale factor of 2 and an embed dimension of 620), a medium-sized SFNO (scale factor 4 and embed dimension 384), and a small SFNO (scale factor 6 and embed dimension 220). Spectra are computed relative to the ERA5 spectrum (horizontal black line).
Figure 3: Ensemble spread from different numbers of checkpoints. Ensemble spread is calculated as the square root of time-mean, global-mean variance Fortin2014. A correction factor of N-1 is applied to account for different ensemble sizes in the unbiased estimator of variance. At a lead time of five days, ensemble spread is averaged over forecasts from fifty-two initial conditions in the validation set (one per week starting 01-02-2018). Ensemble spread is shown for total column water vapor (left), 10m wind speed (middle), and 2m temperature (right). For each number of SFNO checkpoints, 200 estimates of ensemble spread are obtained by taking 100 bootstrap random samples of the SFNO checkpoints. The box-and-whiskers visualize the distribution of these 200 trials: the middle of the box is the median, the ends of the box are the first and third quartile of the data, and the ends of the box are correspond to the minimum and maximum.
Figure 4: Diagram of generating bred vectors. This diagram details the process of generating bred vectors used for developing initial condition perturbations at $t_0$. First, using the input three time steps before $t_0$ (denoted $t_{-2}$), random noise is added to 500 hPa geopotential (z500). This noise respects spherical geometry and has a spatial correlation length scale of 500 km. With $t_{-2}$ as the initial condition, the perturbed forecast is subtracted from the control forecast. This difference is rescaled and used as a new perturbation, which is added to $t_{-1}$. This process is repeated for $t_{0}$. For each variable during every step of the breeding process, the amplitude of the perturbation is scaled to be 0.35 * the deterministic RMSE of SFNO at 48 hours
Figure 5: Sample visualizations of the learned bred vectors. For a sample initial time (June 18, 2020 00:00 UTC), the bred vectors are visualized for six different input fields: 850 hPa specific humidity, 10m wind speed, surface pressure, 2m temperature, 500 hPa geopotential, and 850 hPa temperature.
...and 19 more figures

Huge Ensembles Part I: Design of Ensemble Weather Forecasts using Spherical Fourier Neural Operators

TL;DR

Abstract

Huge Ensembles Part I: Design of Ensemble Weather Forecasts using Spherical Fourier Neural Operators

Authors

TL;DR

Abstract

Table of Contents

Figures (24)