Table of Contents
Fetching ...

Skillful joint probabilistic weather forecasting from marginals

Ferran Alet, Ilan Price, Andrew El-Kadi, Dominic Masters, Stratis Markou, Tom R. Andersson, Jacklynn Stott, Remi Lam, Matthew Willson, Alvaro Sanchez-Gonzalez, Peter Battaglia

TL;DR

FGN introduces a scalable probabilistic weather forecasting framework that learns ensembles through epistemic-model perturbations and aleatoric-perturbations in parameter space, trained end-to-end to minimize marginal CRPS. By combining deep ensembles with learned functional perturbations and a low-dimensional global noise vector, FGN captures joint spatial dependencies and yields state-of-the-art performance on marginal skill, joint-structure metrics, and tropical cyclone tracks while remaining computationally efficient. The approach outperforms the prior SOTA GenCast across a broad set of metrics, reduces forecast bias, preserves physically plausible spectral content, and demonstrates robust cyclone-track forecasts, suggesting strong practical impact for probabilistic weather prediction. The paper also discusses artifacts and seeds as challenges, and outlines extensions to direct cyclone forecasting, underscoring FGN’s potential as a general-purpose framework for learned, joint-distribution weather modeling.

Abstract

Machine learning (ML)-based weather models have rapidly risen to prominence due to their greater accuracy and speed than traditional forecasts based on numerical weather prediction (NWP), recently outperforming traditional ensembles in global probabilistic weather forecasting. This paper presents FGN, a simple, scalable and flexible modeling approach which significantly outperforms the current state-of-the-art models. FGN generates ensembles via learned model-perturbations with an ensemble of appropriately constrained models. It is trained directly to minimize the continuous rank probability score (CRPS) of per-location forecasts. It produces state-of-the-art ensemble forecasts as measured by a range of deterministic and probabilistic metrics, makes skillful ensemble tropical cyclone track predictions, and captures joint spatial structure despite being trained only on marginals.

Skillful joint probabilistic weather forecasting from marginals

TL;DR

FGN introduces a scalable probabilistic weather forecasting framework that learns ensembles through epistemic-model perturbations and aleatoric-perturbations in parameter space, trained end-to-end to minimize marginal CRPS. By combining deep ensembles with learned functional perturbations and a low-dimensional global noise vector, FGN captures joint spatial dependencies and yields state-of-the-art performance on marginal skill, joint-structure metrics, and tropical cyclone tracks while remaining computationally efficient. The approach outperforms the prior SOTA GenCast across a broad set of metrics, reduces forecast bias, preserves physically plausible spectral content, and demonstrates robust cyclone-track forecasts, suggesting strong practical impact for probabilistic weather prediction. The paper also discusses artifacts and seeds as challenges, and outlines extensions to direct cyclone forecasting, underscoring FGN’s potential as a general-purpose framework for learned, joint-distribution weather modeling.

Abstract

Machine learning (ML)-based weather models have rapidly risen to prominence due to their greater accuracy and speed than traditional forecasts based on numerical weather prediction (NWP), recently outperforming traditional ensembles in global probabilistic weather forecasting. This paper presents FGN, a simple, scalable and flexible modeling approach which significantly outperforms the current state-of-the-art models. FGN generates ensembles via learned model-perturbations with an ensemble of appropriately constrained models. It is trained directly to minimize the continuous rank probability score (CRPS) of per-location forecasts. It produces state-of-the-art ensemble forecasts as measured by a range of deterministic and probabilistic metrics, makes skillful ensemble tropical cyclone track predictions, and captures joint spatial structure despite being trained only on marginals.

Paper Structure

This paper contains 36 sections, 5 equations, 34 figures, 2 tables.

Figures (34)

  • Figure 1: An overview of the FGN generative process for producing a single step of a forecast ensemble from a single pair of input frames ($X^{t-2:t-1}$). Diversity is introduced at two levels, modeling aleatoric and epistemic uncertainty respectively. For a given model $\mathcal{M}_j$, aleatoric uncertainty is introduced at each step of a forecast trajectory by sampling a low-dimensional noise vector $\epsilon^t_i$ used for parameter-shared conditional normalization during the model forward pass. This can be interpreted as applying a perturbation to the neural network weights to obtain $\theta_i^t$, and hence, as sampling the parameters of the neural network. To generate $N$ ensemble members with aleatoric uncertainty, we simply condition on $N$ different $\epsilon_i^t$ independently. The epistemic uncertainty is modeled by ensembling outputs of multiple models $\mathcal{M}_j$ (each with their own $\{\theta^*_j, \Delta_j\}$ parameters) trained independently, each of which generates a subset of ensemble members according to the procedure described above.
  • Figure 2: FGN produces more skillful marginal forecasts than GenCast while preserving good calibration. (a) A scorecard comparing CRPS achieved by FGN and GenCast, where blue cells indicate FGN outperforms GenCast, red denotes where GenCast is better, and hatched regions indicate where the difference in performance was not statistically significant. FGN is significantly better ($p < 0.05$) in 99.9% of cases, with an average improvement of 6.5%. (b-f) Spread skill plots for a number of variables across all lead times. FGN maintains a spread skill ratio very close to 1 across all lead times, indicating well calibrated ensemble spread. (g, h) REV for forecasts of extreme high (>99.99th percentile) 2m temperature and 10 wind speed, at lead times of 1, 5, and 7 days. FGN achieves better REV than GenCast in the case of 10m wind speed, and matches performance on 2m temperature.
  • Figure 3: FGN produces skillful joint forecast distributions. (a, b) CRPS scorecards evaluating average-pooled (respectively max-pooled) fields, for different pool sizes, at 1 and 7 day lead times. FGN achieves better CRPS in average-pooled evaluations in 99.9% (respectively 99%) of all pool-size, variable, leadtime, level combinations, evidencing that FGN skillfully models spatial correlations in the forecast distribution. (c, d) Evaluations of skill on two quantities of interest derived from predicted fields. FGN obtains $10.4\%$ and $15.6\%$ better CRPS on 10m wind speed and $z300 - z500$ at short lead times respectively, with this improvement decreasing as lead time increases. This shows FGN is able to capture across-variable dependencies despite not being directly trained on them. (e - j) Spectra of FGN compared to GenCast, and the ground truth. In some variables (e.g. z500), FGN has a spike at the mesh frequency and some additional high frequency content, more so than GenCast, but orders of magnitude smaller than the dominant powers in the signal.
  • Figure 4: FGN achieves state-of-the-art cyclone track prediction. (a) Position error of the ensemble mean track. FGN achieves up to a 24h improvement in position error over GenCast. Some of this improvement is due to the TempestExtremes cyclone tracker working better on 6-hour timesteps than 12-hour timesteps. However, as shown in the plot, a 12h-step version of FGN still achieves better position error beyond 2 day lead times. (b) REV of track probability predictions. FGN exhibits better REV than GenCast across lead times of up to 5 days, across all cost/loss ratios at which either model is better than climatology.
  • Figure 5: Visualization of a q300 forecast at 15 day lead time, zoomed in to highlight the 'honeycomb' artifacts visible in this variable.
  • ...and 29 more figures