EWMoE: An effective model for global weather forecasting with mixture-of-experts
Lihao Gan, Xin Man, Chenghong Zhang, Jie Shao
TL;DR
The paper tackles fast, accurate global weather forecasting under data and compute constraints. It introduces EWMoE, a ViT-based encoder–decoder augmented with a 3D absolute position embedding, a sparse Mixture-of-Experts layer, and specialized load-balancing and position-aware losses. On ERA5 with only two years of training data, EWMoE outperforms FourCastNet and ClimaX and matches Pangu-Weather and GraphCast on short-range forecasts while using far less data and GPU time; ablations show the MoE architecture and the 3D embedding as key contributors. The work demonstrates that combining geometry-aware representations with sparse, scalable capacity can significantly improve weather forecasts and resource efficiency, with potential applicability to broader climate modeling tasks.
Abstract
Weather forecasting is a crucial task for meteorologic research, with direct social and economic impacts. Recently, data-driven weather forecasting models based on deep learning have shown great potential, achieving superior performance compared with traditional numerical weather prediction methods. However, these models often require massive training data and computational resources. In this paper, we propose EWMoE, an effective model for accurate global weather forecasting, which requires significantly less training data and computational resources. Our model incorporates three key components to enhance prediction accuracy: 3D absolute position embedding, a core Mixture-of-Experts (MoE) layer, and two specific loss functions. We conduct our evaluation on the ERA5 dataset using only two years of training data. Extensive experiments demonstrate that EWMoE outperforms current models such as FourCastNet and ClimaX at all forecast time, achieving competitive performance compared with the state-of-the-art models Pangu-Weather and GraphCast in evaluation metrics such as Anomaly Correlation Coefficient (ACC) and Root Mean Square Error (RMSE). Additionally, ablation studies indicate that applying the MoE architecture to weather forecasting offers significant advantages in improving accuracy and resource efficiency. Code is available at https://github.com/Tomoyi/EWMoE.
