Table of Contents
Fetching ...

ArchesWeather & ArchesWeatherGen: a deterministic and generative model for efficient ML weather forecasting

Guillaume Couairon, Renu Singh, Anastase Charantonis, Christian Lessig, Claire Monteleoni

TL;DR

This paper tackles the need for probabilistic weather forecasting by combining a deterministic transformer model, ArchesWeather, with a probabilistic emulator, ArchesWeatherGen, trained via flow matching. It introduces Cross-Level Attention to replace vertical-local interactions, enabling broader vertical information exchange at reduced parameter cost. By decomposing forecast learning into a deterministic mean and a residual stochastic component, the authors achieve superior ensemble skill while cutting training budgets, and demonstrate that ArchesWeatherGen outperforms IFS ENS and NeuralGCM on multiple variables within practical compute budgets. The work advances accessible ML-based weather forecasting, offers thorough ablations, and provides open-source pipelines for reproducibility and broader adoption in the community.

Abstract

Weather forecasting plays a vital role in today's society, from agriculture and logistics to predicting the output of renewable energies, and preparing for extreme weather events. Deep learning weather forecasting models trained with the next state prediction objective on ERA5 have shown great success compared to numerical global circulation models. However, for a wide range of applications, being able to provide representative samples from the distribution of possible future weather states is critical. In this paper, we propose a methodology to leverage deterministic weather models in the design of probabilistic weather models, leading to improved performance and reduced computing costs. We first introduce \textbf{ArchesWeather}, a transformer-based deterministic model that improves upon Pangu-Weather by removing overrestrictive inductive priors. We then design a probabilistic weather model called \textbf{ArchesWeatherGen} based on flow matching, a modern variant of diffusion models, that is trained to project ArchesWeather's predictions to the distribution of ERA5 weather states. ArchesWeatherGen is a true stochastic emulator of ERA5 and surpasses IFS ENS and NeuralGCM on all WeatherBench headline variables (except for NeuralGCM's geopotential). Our work also aims to democratize the use of deterministic and generative machine learning models in weather forecasting research, with academic computing resources. All models are trained at 1.5° resolution, with a training budget of $\sim$9 V100 days for ArchesWeather and $\sim$45 V100 days for ArchesWeatherGen. For inference, ArchesWeatherGen generates 15-day weather trajectories at a rate of 1 minute per ensemble member on a A100 GPU card. To make our work fully reproducible, our code and models are open source, including the complete pipeline for data preparation, training, and evaluation, at https://github.com/INRIA/geoarches .

ArchesWeather & ArchesWeatherGen: a deterministic and generative model for efficient ML weather forecasting

TL;DR

This paper tackles the need for probabilistic weather forecasting by combining a deterministic transformer model, ArchesWeather, with a probabilistic emulator, ArchesWeatherGen, trained via flow matching. It introduces Cross-Level Attention to replace vertical-local interactions, enabling broader vertical information exchange at reduced parameter cost. By decomposing forecast learning into a deterministic mean and a residual stochastic component, the authors achieve superior ensemble skill while cutting training budgets, and demonstrate that ArchesWeatherGen outperforms IFS ENS and NeuralGCM on multiple variables within practical compute budgets. The work advances accessible ML-based weather forecasting, offers thorough ablations, and provides open-source pipelines for reproducibility and broader adoption in the community.

Abstract

Weather forecasting plays a vital role in today's society, from agriculture and logistics to predicting the output of renewable energies, and preparing for extreme weather events. Deep learning weather forecasting models trained with the next state prediction objective on ERA5 have shown great success compared to numerical global circulation models. However, for a wide range of applications, being able to provide representative samples from the distribution of possible future weather states is critical. In this paper, we propose a methodology to leverage deterministic weather models in the design of probabilistic weather models, leading to improved performance and reduced computing costs. We first introduce \textbf{ArchesWeather}, a transformer-based deterministic model that improves upon Pangu-Weather by removing overrestrictive inductive priors. We then design a probabilistic weather model called \textbf{ArchesWeatherGen} based on flow matching, a modern variant of diffusion models, that is trained to project ArchesWeather's predictions to the distribution of ERA5 weather states. ArchesWeatherGen is a true stochastic emulator of ERA5 and surpasses IFS ENS and NeuralGCM on all WeatherBench headline variables (except for NeuralGCM's geopotential). Our work also aims to democratize the use of deterministic and generative machine learning models in weather forecasting research, with academic computing resources. All models are trained at 1.5° resolution, with a training budget of 9 V100 days for ArchesWeather and 45 V100 days for ArchesWeatherGen. For inference, ArchesWeatherGen generates 15-day weather trajectories at a rate of 1 minute per ensemble member on a A100 GPU card. To make our work fully reproducible, our code and models are open source, including the complete pipeline for data preparation, training, and evaluation, at https://github.com/INRIA/geoarches .

Paper Structure

This paper contains 53 sections, 7 equations, 17 figures, 2 tables.

Figures (17)

  • Figure 1: Summary evaluation metrics (higher is better) on key upper air variables (Z500, Q700, T850, U850 and V850) as a function of training computational budget, comparing ArchesWeather and ArchesWeatherGen to other deterministic and ensemble-based models. Left: RMSE skill score over IFS HRES averaged for lead times of 1 to 3 days, comparing ArchesWeather to other state-of-the-art deterministic ML models in WeatherBench2. Circle size indicate training resolution: small circles for 0.25ยบ/0.7ยบ, big circles for 1ยบ/1.4ยบ/1.5ยบ. Compared to the other models trained at a similar resolution, ArchesWeather reaches competitive or better forecasting performance with a much smaller training budget. Right: Ensemble metrics skill scores over IFS ENS, averaged for lead times of 3 to 10 days. Upper Right: ArchesWeatherGen reaches better fair CRPS scores (fCRPS, see section \ref{['sec:evaluation']}) than NeuralGCM at a much lower computational budget, and our flow-matching based design greatly improves upon the original DDPM diffusion models. Lower Right: ArchesWeatherGen also better approximates the Ensemble Mean than competing methods, including the deterministic model FuXi that was explicitly trained to match the ensemble mean. GraphCast is shown as reference for a non ensemble-based deterministic model, but was not trained to optimize this metric at 3-10 days.
  • Figure 2: Visual comparison of attention schemes used in FuXi/Stormer (left), Pangu-Weather (middle) versus ours (right). Our design has the highest receptive field size without requiring a number of parameters scaling quadratically with respect to the number of layers $Z$.
  • Figure 3: Geopotential (left) and wind speed (right) RMSE of a model without multi-step fine-tuning, for each year in the training set. The error is lower in the recent past, which we attribute to a more observed, constrained, and predictable dynamical system. Additionally, test RMSE (year 2020) are shown in dotted lines, which shows some overfitting compared to the scores in 2018 (last year in train set).
  • Figure 4: Main overview of our training pipeline of ArchesWeatherGen. (1) We train four ArchesWeather models by training neural networks to predict the next state with MSE loss. (2) We compute normalized residuals on the training and OOD sets, using ArchesWeather-Mx4 which is the average of 4 ArchesWeather models. (3) We train our flow matching model on the residual data $\mathbf{r}$, i.e we train a neural network to map residuals corrupted with gaussian noise to their uncorrupted version. (4) We sample ArchesWeatherGen by predicting the mean component with ArchesWeather-Mx4, then iteratively denoise gaussian noise to generate a residual sample, which is normalized and added to the mean to recover a complete sample of $\mathbf{x}_{t+\delta}$. The sampling process is then used autoregressively starting again with $t = t + \delta$ to generate multi-step trajectories. In the paper, we consider $\delta=24h$ and $\rho=1.05$.
  • Figure 5: RMSE skill scores of weather models for lead times up to 10 days. Models that don't use ensembling are shown in dotted lines. We can see that ArchesWeather-Mx4 surpasses Stormer on all headline metrics for lead times up to 8 days, despite using 4 times fewer ensemble members (4 vs 16).
  • ...and 12 more figures