GraphCast: Learning skillful medium-range global weather forecasting

Remi Lam; Alvaro Sanchez-Gonzalez; Matthew Willson; Peter Wirnsberger; Meire Fortunato; Ferran Alet; Suman Ravuri; Timo Ewalds; Zach Eaton-Rosen; Weihua Hu; Alexander Merose; Stephan Hoyer; George Holland; Oriol Vinyals; Jacklynn Stott; Alexander Pritzel; Shakir Mohamed; Peter Battaglia

GraphCast: Learning skillful medium-range global weather forecasting

Remi Lam, Alvaro Sanchez-Gonzalez, Matthew Willson, Peter Wirnsberger, Meire Fortunato, Ferran Alet, Suman Ravuri, Timo Ewalds, Zach Eaton-Rosen, Weihua Hu, Alexander Merose, Stephan Hoyer, George Holland, Oriol Vinyals, Jacklynn Stott, Alexander Pritzel, Shakir Mohamed, Peter Battaglia

TL;DR

GraphCast introduces a graph neural network-based, autoregressive, multi-mesh weather model trained directly on ERA5 reanalysis data to produce global medium-range forecasts at 0.25° resolution. Its encode-process-decode architecture leverages grid-to-mesh and mesh-to-grid transfers across a refined icosahedral multi-mesh, enabling efficient long-range interactions with 36.7 million parameters. Compared to ECMWF's HRES, GraphCast achieves skill improvements on the majority of targets over 10 days and demonstrates robustness in cyclone tracks, atmospheric rivers, and extreme temperature forecasts, while running substantially faster on affordable hardware. The work highlights the viability of data-driven weather forecasting at large scales and points to uncertainty handling and recency-based retraining as important directions for future improvement.

Abstract

Global medium-range weather forecasting is critical to decision-making across many social and economic domains. Traditional numerical weather prediction uses increased compute resources to improve forecast accuracy, but cannot directly use historical weather data to improve the underlying model. We introduce a machine learning-based method called "GraphCast", which can be trained directly from reanalysis data. It predicts hundreds of weather variables, over 10 days at 0.25 degree resolution globally, in under one minute. We show that GraphCast significantly outperforms the most accurate operational deterministic systems on 90% of 1380 verification targets, and its forecasts support better severe event prediction, including tropical cyclones, atmospheric rivers, and extreme temperatures. GraphCast is a key advance in accurate and efficient weather forecasting, and helps realize the promise of machine learning for modeling complex dynamical systems.

GraphCast: Learning skillful medium-range global weather forecasting

TL;DR

Abstract

Paper Structure (100 sections, 39 equations, 53 figures, 5 tables)

This paper contains 100 sections, 39 equations, 53 figures, 5 tables.

Datasets
ERA5
HRES
HRES operational forecasts
HRES-fc0
HRES NaN handling
Tropical cyclone datasets
Notation and problem statement
Time notation
General forecasting problem statement
Modeling ECMWF weather data
GraphCast model
Generating a forecast
Architecture overview
GraphCast's graph
...and 85 more sections

Figures (53)

Figure 1: Model schematic. (a) The input weather state(s) are defined on a 0.25 latitude-longitude grid comprising a total of $721\times 1440 = 1,038,240$ points. Yellow layers in the closeup pop-out window represent the 5 surface variables, and blue layers represent the 6 atmospheric variables that are repeated at 37 pressure levels ($5 + 6\times 37 = 227$ variables per point in total), resulting in a state representation of $235,680,480$ values. (b) GraphCast predicts the next state of the weather on the grid. (c) A forecast is made by iteratively applying GraphCast to each previous predicted state, to produce a sequence of states which represent the weather at successive lead times. (d) The Encoder component of the GraphCast architecture maps local regions of the input (green boxes) into nodes of the multi-mesh graph representation (green, upward arrows which terminate in the green-blue node). (e) The Processor component updates each multi-mesh node using learned message-passing (heavy blue arrows that terminate at a node). (f) The Decoder component maps the processed multi-mesh features (purple nodes) back onto the grid representation (red, downward arrows which terminate at a red box). (g) The multi-mesh is derived from icosahedral meshes of increasing resolution, from the base mesh ($M^0$, $12$ nodes) to the finest resolution ($M^6$, $40,962$ nodes), which has uniform resolution across the globe. It contains the set of nodes from $M^6$, and all the edges from $M^0$ to $M^6$. The learned message-passing over the different meshes' edges happens simultaneously, so that each node is updated by all of its incoming edges.
Figure 2: Skill and skill scores for GraphCast and HRES in 2018. (a) RMSE skill (y-axis) for GraphCast (blue lines) and HRES (black lines), on z500, as a function of lead time (x-axis). Error bars represent 95% confidence intervals. The vertical dashed line represents 3.5 days, which is the last 12 hour increment of the HRES 06z/18z forecasts. The black line represents HRES, where lead times earlier and later than 3.5 days are from the 06z/18z and 00z/12z initializations, respectively. (b) RMSE skill score (y-axis) for GraphCast versus HRES, on z500, as a function of lead time (x-axis). Error bars represent 95% confidence intervals for the skill score. We observe a discontinuity in GraphCast's curve because skill scores up to 3.5 days are computed between GraphCast (initialized at 06z/18z) and HRES's 06z/18z initialization, while after 3.5 days skill scores are computed with respect to HRES's 00z/12z initializations. (c) ACC skill (y-axis) for GraphCast (blue lines) and HRES (black lines), on z500, as a function of lead time (x-axis). (d) Scorecard of RMSE skill scores for GraphCast, with respect to HRES. Each subplot corresponds to one variable: u, v, z, t, q, 2t, 10u, 10v, msl, respectively. The rows of each heatmap correspond to the 13 pressure levels (for the atmospheric variables), from 50 []hPa at the top to 1000 []hPa at the bottom. The columns of each heatmap correspond to the 20 lead times at 12 hour intervals, from 12 hours on the left to 10 days on the right. Each cell's color represents the skill score, as shown in (b), where blue represents negative values (GraphCast has better skill) and red represents positive values (HRES has better skill).
Figure 3: Severe-event prediction. (a) Cyclone forecasting performances for GraphCast and HRES. The x-axis represents lead times (in days), and the y-axis represents median track error (in km). Error bars represent bootstrapped 95% confidence intervals for the median. (b) Cyclone forecasting paired error difference between GraphCast and HRES. The x-axis represents lead times (in days), and the y-axis represents median paired error difference (in km). Error bars represent bootstrapped 95% confidence intervals for the median difference (see Supplements \ref{['app:cyclones']}). (c) Atmospheric river prediction (ivt) skills for GraphCast and HRES. The x-axis represents lead times (in days), and the y-axis represents RMSE. Error bars are 95% confidence intervals. (d) Extreme heat prediction precision-recall for GraphCast and HRES. The x-axis represents recall, and the y-axis represents precision. The curves represent different precision-recall trade-offs when sweeping over gain applied to forecast signals (see Supplements \ref{['sec:app:extremetemperature']}).
Figure 4: Training GraphCast on more recent data. Each colored line represents GraphCast trained with data ending before a different year, from 2018 (blue) to 2021 (purple). The y-axis represents RMSE skill scores on 2021 test data, for z500, with respect to GraphCast trained up to before 2018, over lead times (x-axis). The vertical dashed line represents 3.5 days, where the HRES 06z/18z forecasts end. The black line represents HRES, where lead times earlier and later than 3.5 days are from the 06z/18z and 00z/12z initializations, respectively.
Figure 5: Schematic of HRES-fc0. Each horizontal line represent a forecast made by HRES, initialized at a different time (grey axis). HRES forecasts initialized from 00z and 12z make predictions up to 10 days lead time (blue axis), while HRES forecasts initialized from 06z and 18z make predictions up to 3.75 days. Each square represent a state predicted by HRES, by 6 hours increments (smaller time steps are omitted from the schematic, as well as states in the middle of a forecast trajectory). Red squares represent the forecast at time 0 for each HRES forecast, and defines the data points included in HRES-fc0. The brown axis represents the validity time and allows visualizing the alignment of predictions from different initialization time. For instance, the error of the prediction made by HRES, initialized at 06z (second row of squares from the top), at 12h lead time, i.e., 18z validity time (3rd square from the left) would be measured against the first step of the HRES forecast initialized at 18z (red square from the last row of square).
...and 48 more figures

GraphCast: Learning skillful medium-range global weather forecasting

TL;DR

Abstract

GraphCast: Learning skillful medium-range global weather forecasting

Authors

TL;DR

Abstract

Table of Contents

Figures (53)