Table of Contents
Fetching ...

Utilizing Strategic Pre-training to Reduce Overfitting: Baguan -- A Pre-trained Weather Forecasting Model

Peisong Niu, Ziqing Ma, Tian Zhou, Weiqi Chen, Lefei Shen, Rong Jin, Liang Sun

TL;DR

This work tackles overfitting in data-limited weather forecasting by introducing Baguan, a transformer-based model pre-trained with Siamese MAE to inject locality bias. The authors demonstrate that carefully crafted pre-training, followed by a three-stage fine-tuning protocol and ensemble strategies, yields robust generalization and strong performance on global medium-range forecasts, S2S tasks, and high-resolution regional forecasting on ERA5-derived data. They provide both empirical evidence and theoretical intuition showing pre-training acts as a regularizer that emphasizes leading covariance subspaces, enabling better handling of scarce atmospheric data. The results show Baguan consistently outperforms strong baselines such as Pangu-Weather and IFS, with notable gains in long-lead forecasts and regional accuracy, underscoring the practical impact of pre-training as a foundation for downstream meteorological applications.

Abstract

Weather forecasting has long posed a significant challenge for humanity. While recent AI-based models have surpassed traditional numerical weather prediction (NWP) methods in global forecasting tasks, overfitting remains a critical issue due to the limited availability of real-world weather data spanning only a few decades. Unlike fields like computer vision or natural language processing, where data abundance can mitigate overfitting, weather forecasting demands innovative strategies to address this challenge with existing data. In this paper, we explore pre-training methods for weather forecasting, finding that selecting an appropriately challenging pre-training task introduces locality bias, effectively mitigating overfitting and enhancing performance. We introduce Baguan, a novel data-driven model for medium-range weather forecasting, built on a Siamese Autoencoder pre-trained in a self-supervised manner and fine-tuned for different lead times. Experimental results show that Baguan outperforms traditional methods, delivering more accurate forecasts. Additionally, the pre-trained Baguan demonstrates robust overfitting control and excels in downstream tasks, such as subseasonal-to-seasonal (S2S) modeling and regional forecasting, after fine-tuning.

Utilizing Strategic Pre-training to Reduce Overfitting: Baguan -- A Pre-trained Weather Forecasting Model

TL;DR

This work tackles overfitting in data-limited weather forecasting by introducing Baguan, a transformer-based model pre-trained with Siamese MAE to inject locality bias. The authors demonstrate that carefully crafted pre-training, followed by a three-stage fine-tuning protocol and ensemble strategies, yields robust generalization and strong performance on global medium-range forecasts, S2S tasks, and high-resolution regional forecasting on ERA5-derived data. They provide both empirical evidence and theoretical intuition showing pre-training acts as a regularizer that emphasizes leading covariance subspaces, enabling better handling of scarce atmospheric data. The results show Baguan consistently outperforms strong baselines such as Pangu-Weather and IFS, with notable gains in long-lead forecasts and regional accuracy, underscoring the practical impact of pre-training as a foundation for downstream meteorological applications.

Abstract

Weather forecasting has long posed a significant challenge for humanity. While recent AI-based models have surpassed traditional numerical weather prediction (NWP) methods in global forecasting tasks, overfitting remains a critical issue due to the limited availability of real-world weather data spanning only a few decades. Unlike fields like computer vision or natural language processing, where data abundance can mitigate overfitting, weather forecasting demands innovative strategies to address this challenge with existing data. In this paper, we explore pre-training methods for weather forecasting, finding that selecting an appropriately challenging pre-training task introduces locality bias, effectively mitigating overfitting and enhancing performance. We introduce Baguan, a novel data-driven model for medium-range weather forecasting, built on a Siamese Autoencoder pre-trained in a self-supervised manner and fine-tuned for different lead times. Experimental results show that Baguan outperforms traditional methods, delivering more accurate forecasts. Additionally, the pre-trained Baguan demonstrates robust overfitting control and excels in downstream tasks, such as subseasonal-to-seasonal (S2S) modeling and regional forecasting, after fine-tuning.

Paper Structure

This paper contains 52 sections, 33 equations, 11 figures, 7 tables.

Figures (11)

  • Figure 1: Overview of Baguan's architecture and its three training stages. The process begins with the aggregation of the initial state $\mathbf{X}^{t_0}$ and a masked state $\mathbf{X}^{t_0+\Delta t}_{masked}$ into a single channel respectively, which are then input into a weight-sharing encoder. The two representations are subsequently combined through a cross-self decoder and a prediction head to produce the reconstruction or forecasting results $\mathbf{X}^{t0+\Delta t}$.
  • Figure 2: (Left): Comparison of relative RMSE for 6-hour forecasts across various weather variables, including Z500, T2M, T850, and U10. The RMSE values are presented relative to the baseline model without pre-train. Different pre-training methods are evaluated: Baguan (Siamese MAE) and MAE. (Right): Comparison of relative RMSE across different masking ratios ranging from 0.5 to 0.99. The RMSE values are presented relative to the version without pre-train (w/o PT).
  • Figure 3: The total spectrum energy of top k% attention scores. The total number of tokens is about 16,000. The sum of energy is 1.
  • Figure 4: Comparison of global latitude-weighted RMSE and ACC against forecast lead time for Baguan (green), Pangu-Weather (orange) and IFS (blue). Key variables analyzed include T2M, U10, T500, U500, Z500 and T850. More detailed results can be found in Appendix \ref{['app:main_res']}.
  • Figure 5: (a) The train loss and valid ACC of Baguan-S, Baguan-P and FuXi (swin-based model) on S2S task. *-S represents train from sratch and *-P represents pre-training. (b-c) The TCC and PCC examined for Baguan-S2S and ECMWF-S2S models in forecasting T2M and Z500. *-15 and *-24 indicate a lead time of 15 and 42 days, respectively. higher values of both TCC and PCC indicate better performance. The detailed results can be found in guo2024maximizing.
  • ...and 6 more figures