Table of Contents
Fetching ...

ArchesWeather: An efficient AI weather forecasting model at 1.5° resolution

Guillaume Couairon, Christian Lessig, Anastase Charantonis, Claire Monteleoni

TL;DR

ArchesWeather tackles the cost and scalability challenge in AI-based weather forecasting by questioning the necessity of local 3D attention and introducing Cross-Level Attention (CLA) to enable efficient vertical information exchange. The model employs a Swin U-Net transformer with Earth-specific biases and processes ERA5 data at 1.5-degree resolution for a 24-hour lead time, achieving competitive RMSE with a fraction of the training budget compared to larger baselines. CLA reduces parameter burden by performing vertical column-wise attention, enabling global vertical interaction and faster inference, with additional gains from fine-tuning on recent ERA5 data. Overall, ArchesWeather demonstrates that high-skill forecasts at moderate resolution are achievable on academic resources, with potential for downstream downscaling or diffusion-based refinement to finer scales.

Abstract

One of the guiding principles for designing AI-based weather forecasting systems is to embed physical constraints as inductive priors in the neural network architecture. A popular prior is locality, where the atmospheric data is processed with local neural interactions, like 3D convolutions or 3D local attention windows as in Pangu-Weather. On the other hand, some works have shown great success in weather forecasting without this locality principle, at the cost of a much higher parameter count. In this paper, we show that the 3D local processing in Pangu-Weather is computationally sub-optimal. We design ArchesWeather, a transformer model that combines 2D attention with a column-wise attention-based feature interaction module, and demonstrate that this design improves forecasting skill. ArchesWeather is trained at 1.5° resolution and 24h lead time, with a training budget of a few GPU-days and a lower inference cost than competing methods. An ensemble of four of our models shows better RMSE scores than the IFS HRES and is competitive with the 1.4° 50-members NeuralGCM ensemble for one to three days ahead forecasting. Our code and models are publicly available at https://github.com/gcouairon/ArchesWeather.

ArchesWeather: An efficient AI weather forecasting model at 1.5° resolution

TL;DR

ArchesWeather tackles the cost and scalability challenge in AI-based weather forecasting by questioning the necessity of local 3D attention and introducing Cross-Level Attention (CLA) to enable efficient vertical information exchange. The model employs a Swin U-Net transformer with Earth-specific biases and processes ERA5 data at 1.5-degree resolution for a 24-hour lead time, achieving competitive RMSE with a fraction of the training budget compared to larger baselines. CLA reduces parameter burden by performing vertical column-wise attention, enabling global vertical interaction and faster inference, with additional gains from fine-tuning on recent ERA5 data. Overall, ArchesWeather demonstrates that high-skill forecasts at moderate resolution are achievable on academic resources, with potential for downstream downscaling or diffusion-based refinement to finer scales.

Abstract

One of the guiding principles for designing AI-based weather forecasting systems is to embed physical constraints as inductive priors in the neural network architecture. A popular prior is locality, where the atmospheric data is processed with local neural interactions, like 3D convolutions or 3D local attention windows as in Pangu-Weather. On the other hand, some works have shown great success in weather forecasting without this locality principle, at the cost of a much higher parameter count. In this paper, we show that the 3D local processing in Pangu-Weather is computationally sub-optimal. We design ArchesWeather, a transformer model that combines 2D attention with a column-wise attention-based feature interaction module, and demonstrate that this design improves forecasting skill. ArchesWeather is trained at 1.5° resolution and 24h lead time, with a training budget of a few GPU-days and a lower inference cost than competing methods. An ensemble of four of our models shows better RMSE scores than the IFS HRES and is competitive with the 1.4° 50-members NeuralGCM ensemble for one to three days ahead forecasting. Our code and models are publicly available at https://github.com/gcouairon/ArchesWeather.
Paper Structure (21 sections, 1 equation, 9 figures, 4 tables)

This paper contains 21 sections, 1 equation, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Relative RMSE improvement over the IFS HRES as a function of training computational budget, averaged for key upper air variables (Z500, Q700, T850, U850 and V850) and lead times of 24h/48h/72h. Circle size indicate training resolution: small circles for 0.25º/0.7º, big circles for 1º/1.4º/1.5º. ArchesWeatherreaches competitive forecasting performance with a much smaller training budget.
  • Figure 2: Comparison of attention schemes used in Pangu-Weather (left) versus ours (right).
  • Figure 3: Comparison of attention schemes used in Pangu (Local Attention, left), Stormer/FuXi (Concatenated columns, middle) and ours (Cross-level Attention, right). For each scheme, a single vertical column is represented to illustrate how each layer processes column-wise information. RF stands for Receptive Field.
  • Figure 4: Geopotential (left) and wind speed (right) RMSE of a model w/o fine-tuning, for each year in the training set. Test RMSE (year 2020) are shown in dotted lines.
  • Figure 5: Architecture of our convolutional head.
  • ...and 4 more figures