Huge Ensembles Part II: Properties of a Huge Ensemble of Hindcasts Generated with Spherical Fourier Neural Operators

Ankur Mahesh; William Collins; Boris Bonev; Noah Brenowitz; Yair Cohen; Peter Harrington; Karthik Kashinath; Thorsten Kurth; Joshua North; Travis OBrien; Michael Pritchard; David Pruitt; Mark Risser; Shashank Subramanian; Jared Willard

Huge Ensembles Part II: Properties of a Huge Ensemble of Hindcasts Generated with Spherical Fourier Neural Operators

Ankur Mahesh, William Collins, Boris Bonev, Noah Brenowitz, Yair Cohen, Peter Harrington, Karthik Kashinath, Thorsten Kurth, Joshua North, Travis OBrien, Michael Pritchard, David Pruitt, Mark Risser, Shashank Subramanian, Jared Willard

TL;DR

The paper advances extreme-weather statistics by creating Huge Ensembles (HENS) of hindcasts using Spherical Fourier Neural Operators with bred vectors and multiple checkpoints to achieve a large, scalable sample of possible atmospheric trajectories. It demonstrates tail sampling and improved probabilistic forecasts, using metrics such as information gain and owCRPS, while comparing against smaller ensembles and traditional forecasts. The study also tackles practical challenges of generating, storing, and reproducing petabyte-scale ML ensembles, and discusses how such ensembles can complement traditional NWP for studying LLHIs and improving uncertainty quantification. Overall, HENS offers a scalable, data-driven tool to study extreme events and counterfactual weather scenarios, with open resources to support reproducibility and further research.

Abstract

In Part I, we created an ensemble based on Spherical Fourier Neural Operators. As initial condition perturbations, we used bred vectors, and as model perturbations, we used multiple checkpoints trained independently from scratch. Based on diagnostics that assess the ensemble's physical fidelity, our ensemble has comparable performance to operational weather forecasting systems. However, it requires orders of magnitude fewer computational resources. Here in Part II, we generate a huge ensemble (HENS), with 7,424 members initialized each day of summer 2023. We enumerate the technical requirements for running huge ensembles at this scale. HENS precisely samples the tails of the forecast distribution and presents a detailed sampling of internal variability. HENS has two primary applications: (1) as a large dataset with which to study the statistics and drivers of extreme weather and (2) as a weather forecasting system. For extreme climate statistics, HENS samples events 4$σ$ away from the ensemble mean. At each grid cell, HENS increases the skill of the most accurate ensemble member and enhances coverage of possible future trajectories. As a weather forecasting model, HENS issues extreme weather forecasts with better uncertainty quantification. It also reduces the probability of outlier events, in which the verification value lies outside the ensemble forecast distribution.

Huge Ensembles Part II: Properties of a Huge Ensemble of Hindcasts Generated with Spherical Fourier Neural Operators

TL;DR

Abstract

away from the ensemble mean. At each grid cell, HENS increases the skill of the most accurate ensemble member and enhances coverage of possible future trajectories. As a weather forecasting model, HENS issues extreme weather forecasts with better uncertainty quantification. It also reduces the probability of outlier events, in which the verification value lies outside the ensemble forecast distribution.

Paper Structure (30 sections, 52 equations, 24 figures, 1 table)

This paper contains 30 sections, 52 equations, 24 figures, 1 table.

Introduction
Generating the Huge Ensemble
Technical Setup
Regenerating the Ensemble
Climate and Extreme Statistics
Sampling the Forecast Distribution with Huge Ensembles
Sampling the Observed Distribution with Huge Ensembles
Validating Huge Ensemble Weather Forecasts
Metrics based on the entire distribution and metrics based on the conditional distribution
Confidence Intervals of Extreme Forecasts
Missed Events in HENS and IFS
Discussion and Conclusion
References for Ensembles Listed in Figure 1
Post-processing Data to Improve Technical Analysis of the Huge Ensemble
Effect of Ensemble Size on Reliability Diagrams and Spread Error Ratio
...and 15 more sections

Figures (24)

Figure 1: Ensemble Sizes in Weather and Climate Prediction. The left panel shows the ensemble size of traditional ensembles, which rely on numerical, physics-based simulation. The right panel shows ensemble sizes from machine learning weather prediction ensembles. Huge Ensembles (HENS) is the ensemble presented here, and it is highlighted with red. Bracketed numbers correspond to the numbered list of references for this figure provided in Section \ref{['app:ensemble_size_reference']}.
Figure 2: Information Gain from Huge Ensembles (HENS). Information gain is the maximum number of standard deviations from the mean that are sampled by the ensemble. The mean and standard deviation are calculated from the ensemble distribution itself. Gain is calculated for the ensemble predictions of the global land-mean value of each variable. For a Gaussian distribution, the theoretical information gain as a function of ensemble size is shown with the dotted black line. Using the HENS hindcasts from a 7,424-member ensemble initialized each day of boreal summer 2023, the empirical gain for each variable is shown as a function of ensemble size. Results are shown for a 240-hour lead time (forecast day 10). Note the use of a logarithmic scale on the x-axis.
Figure 3: Information Gain from Huge Ensembles (HENS) at each grid cell. Information gain is calculated using the same method as Figure \ref{['fig:information_gain']}, but it is calculated at each grid cell, instead of on the global land mean value. (a) Information gain for huge ensembles (7424 members). (b) Information gain for 50-member ensembles. Gain is calculated for 2m temperature at a lead time of 10 days and across all forecasts initialized in summer 2023.
Figure 4: Large Sample Behavior of Huge Ensembles (HENS). The ensemble mean, standard deviation, 0.1$^\text{th}$, 10$^\text{th}$, 90$^\text{th}$, and 99.9$^\text{th}$ percentiles of global land-mean 2m temperature are shown for different ensemble sizes. For comparison across initial times, all statistics are normalized by the full ensemble standard deviations calculated separately for each forecast initial date. Statistics are averaged over 92 initial times (one for each day of boreal summer 2023 at 00:00 UTC). The "true" statistic is calculated from the full 7,424-member huge ensemble; the solid green line and shading indicate the mean and 95 percent confidence interval, respectively, calculated from bootstrap random samples from the ensemble. Statistics are shown for a 240-hour lead time (forecast day 10).
Figure 5: Demonstration of using Huge Ensembles for heatwave forecasts in Kansas City, Missouri, USA. (a) Box plot of ensemble forecast of heat index, as a function of initial time. Blue denotes HENS forecasts and red denotes IFS forecasts. Range of box and whisker plots indicates the farthest data points within 1.5x the interquartile range. (b) 2D density plot for 10-day forecasts of 2m dewpoint and 2m air temperature. The outermost contour interval is the 95$^\text{th}$ percentile kernel density estimate of the ensemble distribution. Contour intervals decrease at intervals of 10 percent. Blue dots indicate forecasts of individual HENS members; magenta triangles indicate forecasts from IFS ensemble members; the black star is ERA5 (the verification dataset for HENS); and the gray star is operational analysis (the verification dataset of IFS). The dashed line is the climatological average temperature at this location.
...and 19 more figures

Huge Ensembles Part II: Properties of a Huge Ensemble of Hindcasts Generated with Spherical Fourier Neural Operators

TL;DR

Abstract

Huge Ensembles Part II: Properties of a Huge Ensemble of Hindcasts Generated with Spherical Fourier Neural Operators

Authors

TL;DR

Abstract

Table of Contents

Figures (24)