Exploring the design space of deep-learning-based weather forecasting systems
Shoaib Ahmed Siddiqui, Jean Kossaifi, Boris Bonev, Christopher Choy, Jan Kautz, David Krueger, Kamyar Azizzadenesheli
TL;DR
This work tackles the clear need to understand how design choices affect deep-learning weather forecasting systems. By systematically evaluating fixed-grid versus grid-invariant architectures, problem formulations (direct vs delta predictions), pretraining strategies, input channels, loss functions, and dataset sizes on ERA5 data, the authors reveal that fixed-grid, computer-vision–style models generally outperform grid-invariant counterparts under a fixed budget. Delta prediction, zenith-angle augmentation, appropriate padding, multi-step fine-tuning, and the use of larger data for smaller models emerge as consistently beneficial, while image-based pretraining shows mixed results. Based on these insights, the paper advocates a hybrid GraphUNet approach to combine fixed-grid performance with grid-invariant flexibility and highlights practical guidance for building more accurate and robust weather forecasting systems.
Abstract
Despite tremendous progress in developing deep-learning-based weather forecasting systems, their design space, including the impact of different design choices, is yet to be well understood. This paper aims to fill this knowledge gap by systematically analyzing these choices including architecture, problem formulation, pretraining scheme, use of image-based pretrained models, loss functions, noise injection, multi-step inputs, additional static masks, multi-step finetuning (including larger stride models), as well as training on a larger dataset. We study fixed-grid architectures such as UNet, fully convolutional architectures, and transformer-based models, along with grid-invariant architectures, including graph-based and operator-based models. Our results show that fixed-grid architectures outperform grid-invariant architectures, indicating a need for further architectural developments in grid-invariant models such as neural operators. We therefore propose a hybrid system that combines the strong performance of fixed-grid models with the flexibility of grid-invariant architectures. We further show that multi-step fine-tuning is essential for most deep-learning models to work well in practice, which has been a common practice in the past. Pretraining objectives degrade performance in comparison to supervised training, while image-based pretrained models provide useful inductive biases in some cases in comparison to training the model from scratch. Interestingly, we see a strong positive effect of using a larger dataset when training a smaller model as compared to training on a smaller dataset for longer. Larger models, on the other hand, primarily benefit from just an increase in the computational budget. We believe that these results will aid in the design of better weather forecasting systems in the future.
