GenCast: Diffusion-based ensemble forecasting for medium-range weather

Ilan Price; Alvaro Sanchez-Gonzalez; Ferran Alet; Tom R. Andersson; Andrew El-Kadi; Dominic Masters; Timo Ewalds; Jacklynn Stott; Shakir Mohamed; Peter Battaglia; Remi Lam; Matthew Willson

GenCast: Diffusion-based ensemble forecasting for medium-range weather

Ilan Price, Alvaro Sanchez-Gonzalez, Ferran Alet, Tom R. Andersson, Andrew El-Kadi, Dominic Masters, Timo Ewalds, Jacklynn Stott, Shakir Mohamed, Peter Battaglia, Remi Lam, Matthew Willson

TL;DR

GenCast introduces a diffusion-based probabilistic weather predictor trained on ERA5 reanalysis to produce fast, 15-day ensemble forecasts at high spatial resolution. It outperforms ECMWF ENS on the majority of verification targets and demonstrates strong calibration, sharp sample realism, and valuable performance for extreme events and spatially aggregated tasks. The approach also shows clear benefits for downstream applications like wind power forecasting and tropical cyclone tracking. The work highlights the potential of generative AI methods to advance operational weather forecasting, while noting practical considerations for deployment and data assimilation.

Abstract

Weather forecasts are fundamentally uncertain, so predicting the range of probable weather scenarios is crucial for important decisions, from warning the public about hazardous weather, to planning renewable energy use. Here, we introduce GenCast, a probabilistic weather model with greater skill and speed than the top operational medium-range weather forecast in the world, the European Centre for Medium-Range Forecasts (ECMWF)'s ensemble forecast, ENS. Unlike traditional approaches, which are based on numerical weather prediction (NWP), GenCast is a machine learning weather prediction (MLWP) method, trained on decades of reanalysis data. GenCast generates an ensemble of stochastic 15-day global forecasts, at 12-hour steps and 0.25 degree latitude-longitude resolution, for over 80 surface and atmospheric variables, in 8 minutes. It has greater skill than ENS on 97.4% of 1320 targets we evaluated, and better predicts extreme weather, tropical cyclones, and wind power production. This work helps open the next chapter in operational weather forecasting, where critical weather-dependent decisions are made with greater accuracy and efficiency.

GenCast: Diffusion-based ensemble forecasting for medium-range weather

TL;DR

Abstract

Paper Structure (74 sections, 22 equations, 65 figures, 6 tables)

This paper contains 74 sections, 22 equations, 65 figures, 6 tables.

Introduction
GenCast
Realism of GenCast samples
Baselines
Ensemble skill
Ensemble calibration
Extreme weather events
Local surface extremes
Tropical cyclones
Skill in predicting the joint distribution
Spatially pooled evaluation
Regional wind power forecasting
Conclusion
Task definition and general approach
Data
...and 59 more sections

Figures (65)

Figure 1: Schematic of how GenCast produces a forecast. The blue box shows how conditioning inputs, $(X^0, X^{-1})$, and an initial noise sample, $Z^1_0$, are refined by the neural network refinement function, $r_\theta$ (green box), which is parameterised by $\theta$. The resulting $Z^1_1$ is the first refined candidate state, and this process repeats $N$ times. The final $Z^1_N$ is then added as a residual to $X^0$, to produce the weather state at the next time step, $X^1$. This process then repeats, autoregressively, $T=30$ times, conditioning on $(X^{t},X^{t-1})$ and using a new initial noise sample $Z^t_0$ at each step, to produce the full weather trajectory sample (for visual clarity, we illustrate the previous state in parentheses, $(X^{t-1})$, below the current state, $X^{t}$). Each trajectory generated via independent $Z^{1:T}_0$ noise samples represents a sample from, $p\left(X^{1:T} \vert X^0, X^{-1}\right)$.
Figure 2: Visualisation of forecasts and tropical cyclone tracks for the validity time of 12 October, 2019, 06 UTC, hours before Typhoon Hagibis made landfall in Japan.(a) The ERA5 analysis state for specific humidity at 700hPa, at validity time 06 UTC, October 12, 2019, shows Typhoon Hagibis near the center of the frame. (b-d) Three sample GenCast forecast states, initialised one day earlier, show how the samples are sharp, and very similar to one another. (e) The GenCast ensemble mean, obtained by computing the mean of 50 sample states like in (b-d), is somewhat blurry, showing how uncertainty results in a blurrier average state. (f) The forecast state from (deterministic) GraphCast, initialised one day earlier like in (b-e), is blurry, similar to GenCast's ensemble mean. (g) The spatial power spectrum of the states in (a), (b), (e), and (f), where line colors match the frames of the panels, show how GenCast samples' spectra closely match ERA5's, while the blurrier GenCast ensemble mean and GraphCast states have less power at shorter wavelengths. (h-m) These subplots are analogous to (b-g), except the forecasts are initialised 15 days earlier. The GenCast samples are still sharp (h-j), while the GenCast ensemble mean (k) and GraphCast (l) states are even blurrier than at the 1-day lead time. This is also reflected in the power spectrum (m), where the GenCast samples' spectra still closely match ERA5's, while the GenCast ensemble mean and GraphCast states have even less power in the shorter wavelengths relative to the 1-day lead time in (g). See \ref{['sec:app:visualizations_hagibis']} for more variables and lead times. (n-q) Typhoon Hagibis's trajectory based on ERA5 (in red) and the ensemble of tropical cyclone trajectories from GenCast (in blue) up to a validity time 4 hours before the cyclone made landfall on Japan. GenCast forecasts are shown at lead times of 7, 3, 5, and 1 day/s. The blue and red circles show cyclone locations at the validity time. At long lead times the cyclone trajectories have substantial spread, while for the shorter lead times the predictive uncertainty collapses to a small range of trajectories. See \ref{['sec:app:cyclone_tracks']} for details and additional cyclone visualisations.
Figure 3: CRPS scores for GenCast versus ENS in 2019. The scorecard compares CRPS skill between GenCast and ENS across all variables and 8 pressure levels, where dark blue indicates GenCast is 20% better than ENS, dark red indicates GenCast is 20% worse, and white means they perform equally. The results indicate that GenCast significantly outperforms ENS ($p < 0.05$) on 97.4% of all reported variable, lead time, and level combinations. Hatched regions indicate where neither model is significantly better.
Figure 4: Extreme weather, joint measures, and wind power. (a) Relative economic value (REV) for predictions of the exceedance of the 99.99th percentile for 2m temperature, at lead times of 1, 5 and 7 days. (b) Relative economic value (REV) for predictions of the presence of a cyclone in any location at a given time, at lead times of 1, 3, and 5 days. (c) Relative CRPS of max-pooled 2m temperature, for different pooling region sizes. (d) Relative CRPS of the total wind power summed across wind farm locations in pooling regions of different sizes.
Figure 51: Per-timestep TempestExtremes cyclone counts for ERA5 and HRES-fc0. The two time-series exhibit high correlation, but HRES-fc0 has $23\%$ more cyclones than ERA5. The TempestExtremes tracker is applied to these analysis datasets as described in section \ref{['sec:app:cyclones:tempest_extremes']}. We then arbitrarily picked a lead time of 4 days to extract cyclone count (all lead times 0-10 days yield very similar results).
...and 60 more figures

GenCast: Diffusion-based ensemble forecasting for medium-range weather

TL;DR

Abstract

GenCast: Diffusion-based ensemble forecasting for medium-range weather

Authors

TL;DR

Abstract

Table of Contents

Figures (65)