Table of Contents
Fetching ...

Technical overview and architecture of the FastNet Machine Learning weather prediction model, version 1.0

Eric G. Daub, Tom Dunstan, Thusal Bennett, Matthew Burnand, James Chappell, Alejandro Coca-Castro, Noushin Eftekhari, J. Scott Hosking, Manvendra Janmaijaya, Jon Lillis, David Salvador-Jasin, Nathan Simpson, Oliver T Strickson, Ryan Sze-Yin Chan, Mohamad Elmasri, Lydia Allegranza France, Sam Madge, Aled Owen, James Robinson, Adam A. Scaife, David Walters, Peter Yatsyshin, Theo McCaie, Levan Bokeria, Hannah Brown, Tom Dodds, David Llewellyn-Jones, Sophia Moreton, Tom Potter, Iain Stenson, Louisa van Zeeland, Karina Bett-Williams, Kirstine Ida Dale

TL;DR

FastNet introduces a deterministic Graph Neural Network framework for global medium-range weather prediction using a multilevel icosahedral mesh and an encode–process–decode pipeline. Trained on ERA5 reanalysis and optimized via autoregressive rollout, it achieves RMSE and ACC that are competitive with the Met Office Global Model across a hold-out year, at both $1^{\circ}$ and $0.25^{\circ}$ resolutions. The approach leverages residual forecasting, multi-scale connectivity, and carefully weighted training losses to balance short- and long-range spatial information, showing strong potential for operational use pending prospective validation. Overall, FastNet demonstrates that data-driven, GNN-based global NWP can reach skill levels comparable to traditional physics-based systems while offering scalability and resolution flexibility.

Abstract

We present FastNet version 1.0, a data-driven medium range numerical weather prediction (NWP) model based on a Graph Neural Network architecture, developed jointly between the Alan Turing Institute and the Met Office. FastNet uses an encode-process-decode structure to produce deterministic global weather predictions out to 10 days. The architecture is independent of spatial resolution and we have trained models at 1$^{\circ}$ and 0.25$^{\circ}$ resolution, with a six hour time step. FastNet uses a multi-level mesh in the processor, which is able to capture both short-range and long-range patterns in the spatial structure of the atmosphere. The model is pre-trained on ECMWF's ERA5 reanalysis data and then fine-tuned on additional autoregressive rollout steps, which improves accuracy over longer time horizons. We evaluate the model performance at 1.5$^{\circ}$ resolution using 2022 as a hold-out year and compare with the Met Office Global Model, finding that FastNet surpasses the skill of the current Met Office Global Model NWP system using a variety of evaluation metrics on a number of atmospheric variables. Our results show that both our 1$^{\circ}$ and 0.25$^{\circ}$ FastNet models outperform the current Global Model and produce results with predictive skill approaching those of other data-driven models trained on 0.25$^{\circ}$ ERA5.

Technical overview and architecture of the FastNet Machine Learning weather prediction model, version 1.0

TL;DR

FastNet introduces a deterministic Graph Neural Network framework for global medium-range weather prediction using a multilevel icosahedral mesh and an encode–process–decode pipeline. Trained on ERA5 reanalysis and optimized via autoregressive rollout, it achieves RMSE and ACC that are competitive with the Met Office Global Model across a hold-out year, at both and resolutions. The approach leverages residual forecasting, multi-scale connectivity, and carefully weighted training losses to balance short- and long-range spatial information, showing strong potential for operational use pending prospective validation. Overall, FastNet demonstrates that data-driven, GNN-based global NWP can reach skill levels comparable to traditional physics-based systems while offering scalability and resolution flexibility.

Abstract

We present FastNet version 1.0, a data-driven medium range numerical weather prediction (NWP) model based on a Graph Neural Network architecture, developed jointly between the Alan Turing Institute and the Met Office. FastNet uses an encode-process-decode structure to produce deterministic global weather predictions out to 10 days. The architecture is independent of spatial resolution and we have trained models at 1 and 0.25 resolution, with a six hour time step. FastNet uses a multi-level mesh in the processor, which is able to capture both short-range and long-range patterns in the spatial structure of the atmosphere. The model is pre-trained on ECMWF's ERA5 reanalysis data and then fine-tuned on additional autoregressive rollout steps, which improves accuracy over longer time horizons. We evaluate the model performance at 1.5 resolution using 2022 as a hold-out year and compare with the Met Office Global Model, finding that FastNet surpasses the skill of the current Met Office Global Model NWP system using a variety of evaluation metrics on a number of atmospheric variables. Our results show that both our 1 and 0.25 FastNet models outperform the current Global Model and produce results with predictive skill approaching those of other data-driven models trained on 0.25 ERA5.

Paper Structure

This paper contains 22 sections, 7 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: A high-level overview of the FastNet architecture. Data on the earth grid is first mapped to the mesh through an encode step via a Graph Neural Network, which embeds the grid features into a latent space. Once the weather state at $t=0$ is encoded onto the mesh, $N$ rounds of message-passing are carried out on the mesh to update the node and edge states via multi-level perceptrons (MLPs), with a separate MLP for each round. The state of the mesh is then decoded back to the grid, and added to the initial grid state as a residual connection. This advances the model forward one timestep ($t=6$ h). Longer forecasts are obtained by auto-regressively passing the model predictions back into the model and rolling out the forecasts over the desired lead time. We examine lead times up to 10 days, though our main comparisons with the UK Met Office GM are done out to a 7 day lead time. Note that the encoder and decoder graph connectivity in the figure are for illustrative purposes and are not representative of the actual model details; a full description of how these graphs are constructed is provided in the main text.
  • Figure 2: Illustration of the multimesh used in FastNet Lam2023. (a) Refined icosahedra used to construct the multimesh, including the base icosahedron and 3 levels of refinement. The $\approx 2\degree$ O96 mesh uses two additional refinement levels beyond the finest mesh shown, and the $\approx 1\degree$ N320 mesh uses three additional refinement levels. (b) Combined multimesh, resulting from unifying all mesh levels into a single structure using the finest mesh nodes but combining the node connections from all refinement levels. The combined multimesh includes both short-range and long-range connections that efficiently pass information across the globe.
  • Figure 3: The effect of autoregressive fine-tuning on performance at a range of lead times for FastNet O96 and N320. The figure shows RMSE for temperature at 850 hPa for additional lead times in fine-tuning (horizontal axis) at varying forecast lead times (colour). Fine-tuning generally improves RMSE at longer forecast lead times as expected, until seven or eight additional lead times, when it starts increasing at 72 hours and roughly plateaus in the longer lead times. This suggests that the blurring induced by adding additional lead times goes beyond the optimal level and degrades performance. The single step (6 h) RMSE is expected to increase (get worse) with additional fine-tuning, since this is no longer the optimisation target. This is generally observed, except that O96 with one step of fine-tuning shows a slight improvement, suggesting there was more performance that could have been realised during pre-training for this model.
  • Figure 4: RMSE of power spectrum in the synoptic scale (200--2000 km). Lower RMSE indicates a better match to the frequency content in the ground truth data, and a higher value is associated with more blurry spatial features. Notice that this increases with forecast lead time, and tends to plateau. It also increases with additional fine-tuning steps. The lowest RMSE in 850hPa temperature as shown in Fig. \ref{['fig:fine-tuning-t850-difference']} occurs after seven or eight additional fine-tuning steps, and appears to 'converge' in spectral RMSE. Additional fine-tuning (9, 10, 11 steps) increases both forecast RMSE and spectral RMSE, suggesting a failure in training.
  • Figure 5: Example predictions of specific humidity at 500 hPa from O96 FastNet for several lead times (6 h, 24 h, 48 h, 120 h, and 240 h, all same valid time) and ground truth ERA5 for comparison. FastNet is able to capture the general weather patterns over the typical lead time range of medium-range weather prediction, with some blurring occurring in the predictions for longer lead times in the fine-tuned model as the model becomes more uncertain.
  • ...and 4 more figures