Table of Contents
Fetching ...

FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale

Boris Bonev, Thorsten Kurth, Ankur Mahesh, Mauro Bisson, Jean Kossaifi, Karthik Kashinath, Anima Anandkumar, William D. Collins, Michael S. Pritchard, Alexander Keller

TL;DR

The work tackles the challenge of fast, probabilistic, global weather forecasting at high resolution while preserving physical spectral fidelity. It introduces FourCastNet 3 (FCN3), a spherical geometry–aware, purely convolutional neural operator framework integrated with a hidden Markov model to generate ensembles, trained with a joint spatial–spectral CRPS objective. FCN3 achieves state-of-the-art probabilistic skill with substantial speed advantages over traditional NWP and diffusion-based baselines, and demonstrates stable spectra out to 60 days. The approach is scalable to thousands of GPUs via Makani and is openly released, positioning FCN3 as a practical, extensible platform for next-generation subseasonal forecasting with large ensembles.

Abstract

FourCastNet 3 advances global weather modeling by implementing a scalable, geometric machine learning (ML) approach to probabilistic ensemble forecasting. The approach is designed to respect spherical geometry and to accurately model the spatially correlated probabilistic nature of the problem, resulting in stable spectra and realistic dynamics across multiple scales. FourCastNet 3 delivers forecasting accuracy that surpasses leading conventional ensemble models and rivals the best diffusion-based methods, while producing forecasts 8 to 60 times faster than these approaches. In contrast to other ML approaches, FourCastNet 3 demonstrates excellent probabilistic calibration and retains realistic spectra, even at extended lead times of up to 60 days. All of these advances are realized using a purely convolutional neural network architecture tailored for spherical geometry. Scalable and efficient large-scale training on 1024 GPUs and more is enabled by a novel training paradigm for combined model- and data-parallelism, inspired by domain decomposition methods in classical numerical models. Additionally, FourCastNet 3 enables rapid inference on a single GPU, producing a 60-day global forecast at 0.25°, 6-hourly resolution in under 4 minutes. Its computational efficiency, medium-range probabilistic skill, spectral fidelity, and rollout stability at subseasonal timescales make it a strong candidate for improving meteorological forecasting and early warning systems through large ensemble predictions.

FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale

TL;DR

The work tackles the challenge of fast, probabilistic, global weather forecasting at high resolution while preserving physical spectral fidelity. It introduces FourCastNet 3 (FCN3), a spherical geometry–aware, purely convolutional neural operator framework integrated with a hidden Markov model to generate ensembles, trained with a joint spatial–spectral CRPS objective. FCN3 achieves state-of-the-art probabilistic skill with substantial speed advantages over traditional NWP and diffusion-based baselines, and demonstrates stable spectra out to 60 days. The approach is scalable to thousands of GPUs via Makani and is openly released, positioning FCN3 as a practical, extensible platform for next-generation subseasonal forecasting with large ensembles.

Abstract

FourCastNet 3 advances global weather modeling by implementing a scalable, geometric machine learning (ML) approach to probabilistic ensemble forecasting. The approach is designed to respect spherical geometry and to accurately model the spatially correlated probabilistic nature of the problem, resulting in stable spectra and realistic dynamics across multiple scales. FourCastNet 3 delivers forecasting accuracy that surpasses leading conventional ensemble models and rivals the best diffusion-based methods, while producing forecasts 8 to 60 times faster than these approaches. In contrast to other ML approaches, FourCastNet 3 demonstrates excellent probabilistic calibration and retains realistic spectra, even at extended lead times of up to 60 days. All of these advances are realized using a purely convolutional neural network architecture tailored for spherical geometry. Scalable and efficient large-scale training on 1024 GPUs and more is enabled by a novel training paradigm for combined model- and data-parallelism, inspired by domain decomposition methods in classical numerical models. Additionally, FourCastNet 3 enables rapid inference on a single GPU, producing a 60-day global forecast at 0.25°, 6-hourly resolution in under 4 minutes. Its computational efficiency, medium-range probabilistic skill, spectral fidelity, and rollout stability at subseasonal timescales make it a strong candidate for improving meteorological forecasting and early warning systems through large ensemble predictions.

Paper Structure

This paper contains 56 sections, 55 equations, 26 figures, 4 tables, 3 algorithms.

Figures (26)

  • Figure 1: Schematic of the FourCastNet 3 model. The model predicts the state of the atmosphere at the next timestep, given the state at the previous timestep. Auxiliary variables such as the cosine zenith angle are computed from analytical expressions for each timestep and appended to the input. A hidden Markov model is obtained by conditioning FourCastNet 3 on a stochastic latent variable whose temporal dynamics are governed by a diffusion process on the sphere. The model itself is formed by an encoder, a decoder and 8 neural operator blocks. Each of these operations can be grouped into local, global and pointwise operations and therefore be formulated on arbitrary grids and resolutions, making FourCastNet 3 discretization independent. Green boxes illustrate learnable operations.
  • Figure 2: Illustration of model- and data-parallelism for training of FourCastNet 3. In the given example, the input data is spatially distributed across four ranks (green boxes) by splitting it across the latitude. This reduces the memory footprint of the input, prediction and activations within the network. The training data is read in a sharded fashion from the distributed file system, simultaneously lowering the required I/O per rank. This domain-decomposition requires the model and it's weights to be distributed; spherical harmonic transforms and discrete-continuous convolutions are distributed and split across the four ranks. In addition to the spatial model-parallelism, data-parallelism is utilized to distribute individual ensemble members and batch samples (grey boxes). Finally, the ensemble loss for a single sample is computed by taking the entire ensemble information across ensemble parallel ranks and spatial parallel ranks. On top of this, batch parallelism is utilized (not illustrated in this figure).
  • Figure 3: Probabilistic skill of FourCastNet 3 relative to the ERA5 ground truth. Continuously ranked probabilistic scores (lower is better), ensemble mean RMSE (lower is better), spread-skill ratios (closer to one is better) and rank-histograms (more uniform is better) are reported from top to bottom. The scores are computed over 12-hourly initial conditions ranging from 2020-01-01 00:00:00 UTC to 2020-12-31 23:59:00 UTC.
  • Figure 4: FourCastNet 3 prediction of storm Dennis initialized on 2020-02-11 at 00:00:00 UTC. The plot depicts wind-speeds at a pressure level of 850hPa and isohypses (height contours) of the 500hPa geopotential height. FCN3 accurately predicts the storm and its landfall 5 days in advance, with different ensemble members depicting different scenarios. FCN3 skillfully predicts global weather phenomena at a spatial resolution of $0.25^\circ$ and a temporal resolution of 6 hours. FCN3 exhibits exceptionally accurate and stable spectra even after extended rollouts of 30 days (720 hours) and more.
  • Figure 5: Comparison of angular power spectral densities of a single FourCastNet 3 ensemble member at a lead time of 360 hours to the ERA5 ground truth. Power spectral densities are averaged over 12-hourly initial conditions ranging from 2020-01-01 00:00:00 UTC to 2020-12-31 23:59:00 UTC.
  • ...and 21 more figures