PuYun: Medium-Range Global Weather Forecasting Using Large Kernel Attention Convolutional Networks
Shengchen Zhu, Yiming Chen, Peiying Yu, Xiang Qu, Yuxiao Zhou, Yiming Ma, Zhizhan Zhao, Yukai Liu, Hao Mi, Bin Wang
TL;DR
PuYun addresses the demand for accurate global medium-range weather forecasts with higher spatial resolution by introducing a convolutional model with large kernel attention (LKA-FCN) and a cascade autoregressive training strategy. It demonstrates that PuYun-Short outperforms state-of-the-art ML models on 10-day forecasts for key variables and that a cascade PuYun further improves accuracy, while enabling resolution expansion to $0.1^{\circ}$ via fine-tuning. The results highlight the model's ability to capture local interactions with extended receptive fields and mitigate accumulation errors through cascading and dynamic training, achieving practical improvements on ERA5 data. The work also outlines a scalable training pipeline and envisions end-to-end forecasting from global observations at higher resolutions, with open-source code forthcoming.
Abstract
Accurate weather forecasting is essential for understanding and mitigating weather-related impacts. In this paper, we present PuYun, an autoregressive cascade model that leverages large kernel attention convolutional networks. The model's design inherently supports extended weather prediction horizons while broadening the effective receptive field. The integration of large kernel attention mechanisms within the convolutional layers enhances the model's capacity to capture fine-grained spatial details, thereby improving its predictive accuracy for meteorological phenomena. We introduce PuYun, comprising PuYun-Short for 0-5 day forecasts and PuYun-Medium for 5-10 day predictions. This approach enhances the accuracy of 10-day weather forecasting. Through evaluation, we demonstrate that PuYun-Short alone surpasses the performance of both GraphCast and FuXi-Short in generating accurate 10-day forecasts. Specifically, on the 10th day, PuYun-Short reduces the RMSE for Z500 to 720 $m^2/s^2$, compared to 732 $m^2/s^2$ for GraphCast and 740 $m^2/s^2$ for FuXi-Short. Additionally, the RMSE for T2M is reduced to 2.60 K, compared to 2.63 K for GraphCast and 2.65 K for FuXi-Short. Furthermore, when employing a cascaded approach by integrating PuYun-Short and PuYun-Medium, our method achieves superior results compared to the combined performance of FuXi-Short and FuXi-Medium. On the 10th day, the RMSE for Z500 is further reduced to 638 $m^2/s^2$, compared to 641 $m^2/s^2$ for FuXi. These findings underscore the effectiveness of our model ensemble in advancing medium-range weather prediction. Our training code and model will be open-sourced.
