PuYun: Medium-Range Global Weather Forecasting Using Large Kernel Attention Convolutional Networks

Shengchen Zhu; Yiming Chen; Peiying Yu; Xiang Qu; Yuxiao Zhou; Yiming Ma; Zhizhan Zhao; Yukai Liu; Hao Mi; Bin Wang

PuYun: Medium-Range Global Weather Forecasting Using Large Kernel Attention Convolutional Networks

Shengchen Zhu, Yiming Chen, Peiying Yu, Xiang Qu, Yuxiao Zhou, Yiming Ma, Zhizhan Zhao, Yukai Liu, Hao Mi, Bin Wang

TL;DR

PuYun addresses the demand for accurate global medium-range weather forecasts with higher spatial resolution by introducing a convolutional model with large kernel attention (LKA-FCN) and a cascade autoregressive training strategy. It demonstrates that PuYun-Short outperforms state-of-the-art ML models on 10-day forecasts for key variables and that a cascade PuYun further improves accuracy, while enabling resolution expansion to $0.1^{\circ}$ via fine-tuning. The results highlight the model's ability to capture local interactions with extended receptive fields and mitigate accumulation errors through cascading and dynamic training, achieving practical improvements on ERA5 data. The work also outlines a scalable training pipeline and envisions end-to-end forecasting from global observations at higher resolutions, with open-source code forthcoming.

Abstract

Accurate weather forecasting is essential for understanding and mitigating weather-related impacts. In this paper, we present PuYun, an autoregressive cascade model that leverages large kernel attention convolutional networks. The model's design inherently supports extended weather prediction horizons while broadening the effective receptive field. The integration of large kernel attention mechanisms within the convolutional layers enhances the model's capacity to capture fine-grained spatial details, thereby improving its predictive accuracy for meteorological phenomena. We introduce PuYun, comprising PuYun-Short for 0-5 day forecasts and PuYun-Medium for 5-10 day predictions. This approach enhances the accuracy of 10-day weather forecasting. Through evaluation, we demonstrate that PuYun-Short alone surpasses the performance of both GraphCast and FuXi-Short in generating accurate 10-day forecasts. Specifically, on the 10th day, PuYun-Short reduces the RMSE for Z500 to 720 $m^2/s^2$, compared to 732 $m^2/s^2$ for GraphCast and 740 $m^2/s^2$ for FuXi-Short. Additionally, the RMSE for T2M is reduced to 2.60 K, compared to 2.63 K for GraphCast and 2.65 K for FuXi-Short. Furthermore, when employing a cascaded approach by integrating PuYun-Short and PuYun-Medium, our method achieves superior results compared to the combined performance of FuXi-Short and FuXi-Medium. On the 10th day, the RMSE for Z500 is further reduced to 638 $m^2/s^2$, compared to 641 $m^2/s^2$ for FuXi. These findings underscore the effectiveness of our model ensemble in advancing medium-range weather prediction. Our training code and model will be open-sourced.

PuYun: Medium-Range Global Weather Forecasting Using Large Kernel Attention Convolutional Networks

TL;DR

via fine-tuning. The results highlight the model's ability to capture local interactions with extended receptive fields and mitigate accumulation errors through cascading and dynamic training, achieving practical improvements on ERA5 data. The work also outlines a scalable training pipeline and envisions end-to-end forecasting from global observations at higher resolutions, with open-source code forthcoming.

Abstract

, compared to 732

for GraphCast and 740

for FuXi-Short. Additionally, the RMSE for T2M is reduced to 2.60 K, compared to 2.63 K for GraphCast and 2.65 K for FuXi-Short. Furthermore, when employing a cascaded approach by integrating PuYun-Short and PuYun-Medium, our method achieves superior results compared to the combined performance of FuXi-Short and FuXi-Medium. On the 10th day, the RMSE for Z500 is further reduced to 638

, compared to 641

for FuXi. These findings underscore the effectiveness of our model ensemble in advancing medium-range weather prediction. Our training code and model will be open-sourced.

Paper Structure (21 sections, 4 equations, 6 figures, 2 tables)

This paper contains 21 sections, 4 equations, 6 figures, 2 tables.

Introduction
Method
PuYun Model
Patch embedding
LKA-FCN layers
Patch merging
Autoregressive Forecasting
Loss Function
Datasets
Implementation Details
Training Procedure
Model Evaluation
Experimental Results
Quantitative Comparison
Quantitative Skill Evaluation
...and 6 more sections

Figures (6)

Figure 1: Overview of our proposed PuYun model. It consists of three core components: patch embedding, LKA-FCN layers (the dashed box above) and patch merging (the dashed box below.). The symbol $\otimes$ means concatenation and the $\oplus$ means addition. Skip connections are depicted by thin arrowed lines.
Figure 2: Basic block in LKA-FCN Layers. Each LKA-FCN layer consists of a series of stacked blocks. Each block has a dimension of 1536 with a kernel size of K. The symbol $\otimes$ means Hadamard product.
Figure 3: Pipeline of PuYun for global weather forecasting. PuYun-Short generates forecasts for 0-5 days using autoregression, while PuYun-Medium generates forecasts for 5-10 days in a similar manner with the outputs of PuYun-Short.
Figure 4: Forecast ACC comparison among Pangu, GraphCast, FuXi-Short and PuYun-Short for an array of meteorological elements over a 10-day forecast period. Higher numerical values indicate better performance.
Figure 6: Forecast images and absolute error for Z500. Figures of Z500 on days 3, 5, and 10 are presented with initialization time at 2018-03-30 00:00 UTC.
...and 1 more figures

PuYun: Medium-Range Global Weather Forecasting Using Large Kernel Attention Convolutional Networks

TL;DR

Abstract

PuYun: Medium-Range Global Weather Forecasting Using Large Kernel Attention Convolutional Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (6)