Utilizing Strategic Pre-training to Reduce Overfitting: Baguan -- A Pre-trained Weather Forecasting Model
Peisong Niu, Ziqing Ma, Tian Zhou, Weiqi Chen, Lefei Shen, Rong Jin, Liang Sun
TL;DR
This work tackles overfitting in data-limited weather forecasting by introducing Baguan, a transformer-based model pre-trained with Siamese MAE to inject locality bias. The authors demonstrate that carefully crafted pre-training, followed by a three-stage fine-tuning protocol and ensemble strategies, yields robust generalization and strong performance on global medium-range forecasts, S2S tasks, and high-resolution regional forecasting on ERA5-derived data. They provide both empirical evidence and theoretical intuition showing pre-training acts as a regularizer that emphasizes leading covariance subspaces, enabling better handling of scarce atmospheric data. The results show Baguan consistently outperforms strong baselines such as Pangu-Weather and IFS, with notable gains in long-lead forecasts and regional accuracy, underscoring the practical impact of pre-training as a foundation for downstream meteorological applications.
Abstract
Weather forecasting has long posed a significant challenge for humanity. While recent AI-based models have surpassed traditional numerical weather prediction (NWP) methods in global forecasting tasks, overfitting remains a critical issue due to the limited availability of real-world weather data spanning only a few decades. Unlike fields like computer vision or natural language processing, where data abundance can mitigate overfitting, weather forecasting demands innovative strategies to address this challenge with existing data. In this paper, we explore pre-training methods for weather forecasting, finding that selecting an appropriately challenging pre-training task introduces locality bias, effectively mitigating overfitting and enhancing performance. We introduce Baguan, a novel data-driven model for medium-range weather forecasting, built on a Siamese Autoencoder pre-trained in a self-supervised manner and fine-tuned for different lead times. Experimental results show that Baguan outperforms traditional methods, delivering more accurate forecasts. Additionally, the pre-trained Baguan demonstrates robust overfitting control and excels in downstream tasks, such as subseasonal-to-seasonal (S2S) modeling and regional forecasting, after fine-tuning.
