Table of Contents
Fetching ...

Exploring the design space of deep-learning-based weather forecasting systems

Shoaib Ahmed Siddiqui, Jean Kossaifi, Boris Bonev, Christopher Choy, Jan Kautz, David Krueger, Kamyar Azizzadenesheli

TL;DR

This work tackles the clear need to understand how design choices affect deep-learning weather forecasting systems. By systematically evaluating fixed-grid versus grid-invariant architectures, problem formulations (direct vs delta predictions), pretraining strategies, input channels, loss functions, and dataset sizes on ERA5 data, the authors reveal that fixed-grid, computer-vision–style models generally outperform grid-invariant counterparts under a fixed budget. Delta prediction, zenith-angle augmentation, appropriate padding, multi-step fine-tuning, and the use of larger data for smaller models emerge as consistently beneficial, while image-based pretraining shows mixed results. Based on these insights, the paper advocates a hybrid GraphUNet approach to combine fixed-grid performance with grid-invariant flexibility and highlights practical guidance for building more accurate and robust weather forecasting systems.

Abstract

Despite tremendous progress in developing deep-learning-based weather forecasting systems, their design space, including the impact of different design choices, is yet to be well understood. This paper aims to fill this knowledge gap by systematically analyzing these choices including architecture, problem formulation, pretraining scheme, use of image-based pretrained models, loss functions, noise injection, multi-step inputs, additional static masks, multi-step finetuning (including larger stride models), as well as training on a larger dataset. We study fixed-grid architectures such as UNet, fully convolutional architectures, and transformer-based models, along with grid-invariant architectures, including graph-based and operator-based models. Our results show that fixed-grid architectures outperform grid-invariant architectures, indicating a need for further architectural developments in grid-invariant models such as neural operators. We therefore propose a hybrid system that combines the strong performance of fixed-grid models with the flexibility of grid-invariant architectures. We further show that multi-step fine-tuning is essential for most deep-learning models to work well in practice, which has been a common practice in the past. Pretraining objectives degrade performance in comparison to supervised training, while image-based pretrained models provide useful inductive biases in some cases in comparison to training the model from scratch. Interestingly, we see a strong positive effect of using a larger dataset when training a smaller model as compared to training on a smaller dataset for longer. Larger models, on the other hand, primarily benefit from just an increase in the computational budget. We believe that these results will aid in the design of better weather forecasting systems in the future.

Exploring the design space of deep-learning-based weather forecasting systems

TL;DR

This work tackles the clear need to understand how design choices affect deep-learning weather forecasting systems. By systematically evaluating fixed-grid versus grid-invariant architectures, problem formulations (direct vs delta predictions), pretraining strategies, input channels, loss functions, and dataset sizes on ERA5 data, the authors reveal that fixed-grid, computer-vision–style models generally outperform grid-invariant counterparts under a fixed budget. Delta prediction, zenith-angle augmentation, appropriate padding, multi-step fine-tuning, and the use of larger data for smaller models emerge as consistently beneficial, while image-based pretraining shows mixed results. Based on these insights, the paper advocates a hybrid GraphUNet approach to combine fixed-grid performance with grid-invariant flexibility and highlights practical guidance for building more accurate and robust weather forecasting systems.

Abstract

Despite tremendous progress in developing deep-learning-based weather forecasting systems, their design space, including the impact of different design choices, is yet to be well understood. This paper aims to fill this knowledge gap by systematically analyzing these choices including architecture, problem formulation, pretraining scheme, use of image-based pretrained models, loss functions, noise injection, multi-step inputs, additional static masks, multi-step finetuning (including larger stride models), as well as training on a larger dataset. We study fixed-grid architectures such as UNet, fully convolutional architectures, and transformer-based models, along with grid-invariant architectures, including graph-based and operator-based models. Our results show that fixed-grid architectures outperform grid-invariant architectures, indicating a need for further architectural developments in grid-invariant models such as neural operators. We therefore propose a hybrid system that combines the strong performance of fixed-grid models with the flexibility of grid-invariant architectures. We further show that multi-step fine-tuning is essential for most deep-learning models to work well in practice, which has been a common practice in the past. Pretraining objectives degrade performance in comparison to supervised training, while image-based pretrained models provide useful inductive biases in some cases in comparison to training the model from scratch. Interestingly, we see a strong positive effect of using a larger dataset when training a smaller model as compared to training on a smaller dataset for longer. Larger models, on the other hand, primarily benefit from just an increase in the computational budget. We believe that these results will aid in the design of better weather forecasting systems in the future.

Paper Structure

This paper contains 36 sections, 6 equations, 21 figures, 20 tables.

Figures (21)

  • Figure 1: Geometric ACC (left) and RMSE (right) for comparison between design choices. This figure highlights the marginal contribution between the default choice (primarily based on our default 4-block UNet) and the best possible choice when considering different design decisions explored in this work. In cases where the default decision turned out to be the best, the marginal contribution is zero. Interestingly, there is a significant difference in the marginal contribution of different design choices between the two metrics.
  • Figure 2: Geometric ACC (left) and RMSE (right) for comparison between direct and delta prediction formulations. The figure highlights that delta prediction is almost always superior in terms of performance as compared to direct prediction at the 6h prediction horizon (results in tabular form are presented in Table \ref{['direct_vs_delta_pred_ood_num_steps_test_loss.mean']}).
  • Figure 3: Graph UNet architecture where a fixed-grid UNet architecture is sandwiched between grid-invariant graph encoder and decoder layers. This provides the model with the flexibility of being grid-invariant while retaining the performance of fixed-grid models.
  • Figure 4: Geometric ACC (left) and RMSE (right) for comparison of different architectures when using the delta prediction formulation. The figure highlights that some architectures such as Segformer, SETR, ResNet-50 w/ convolutional decoder, and UNet are more efficient and effective for weather forecasting given a small and fixed number of training steps that we focused on in this work (results in tabular form are presented in Table \ref{['delta_pred_ood_num_steps_test_loss.mean']}).
  • Figure 5: Geometric ACC (left) and RMSE (right) for comparison of different pretraining objectives on UNet with 4 blocks. The figure highlights that supervised pretraining achieves better performance in comparison to self-supervised pretraining using different objectives evaluated in our case (results in tabular form are presented in Table \ref{['pretraining_4layers_ood_num_steps_test_loss.mean']}).
  • ...and 16 more figures