Table of Contents
Fetching ...

Maximizing the Impact of Deep Learning on Subseasonal-to-Seasonal Climate Forecasting: The Essential Role of Optimization

Yizhen Guo, Tian Zhou, Wanyi Jiang, Bo Wu, Liang Sun, Rong Jin

TL;DR

This research contests the recent study that direct forecasting outperforms rolling forecasting for S2S tasks and proposes that the underperformance of rolling forecasting may arise from the accumulation of Jacobian matrix products during training.

Abstract

Weather and climate forecasting is vital for sectors such as agriculture and disaster management. Although numerical weather prediction (NWP) systems have advanced, forecasting at the subseasonal-to-seasonal (S2S) scale, spanning 2 to 6 weeks, remains challenging due to the chaotic and sparse atmospheric signals at this interval. Even state-of-the-art deep learning models struggle to outperform simple climatology models in this domain. This paper identifies that optimization, instead of network structure, could be the root cause of this performance gap, and then we develop a novel multi-stage optimization strategy to close the gap. Extensive empirical studies demonstrate that our multi-stage optimization approach significantly improves key skill metrics, PCC and TCC, while utilizing the same backbone structure, surpassing the state-of-the-art NWP systems (ECMWF-S2S) by over \textbf{19-91\%}. Our research contests the recent study that direct forecasting outperforms rolling forecasting for S2S tasks. Through theoretical analysis, we propose that the underperformance of rolling forecasting may arise from the accumulation of Jacobian matrix products during training. Our multi-stage framework can be viewed as a form of teacher forcing to address this issue. Code is available at \url{https://anonymous.4open.science/r/Baguan-S2S-23E7/}

Maximizing the Impact of Deep Learning on Subseasonal-to-Seasonal Climate Forecasting: The Essential Role of Optimization

TL;DR

This research contests the recent study that direct forecasting outperforms rolling forecasting for S2S tasks and proposes that the underperformance of rolling forecasting may arise from the accumulation of Jacobian matrix products during training.

Abstract

Weather and climate forecasting is vital for sectors such as agriculture and disaster management. Although numerical weather prediction (NWP) systems have advanced, forecasting at the subseasonal-to-seasonal (S2S) scale, spanning 2 to 6 weeks, remains challenging due to the chaotic and sparse atmospheric signals at this interval. Even state-of-the-art deep learning models struggle to outperform simple climatology models in this domain. This paper identifies that optimization, instead of network structure, could be the root cause of this performance gap, and then we develop a novel multi-stage optimization strategy to close the gap. Extensive empirical studies demonstrate that our multi-stage optimization approach significantly improves key skill metrics, PCC and TCC, while utilizing the same backbone structure, surpassing the state-of-the-art NWP systems (ECMWF-S2S) by over \textbf{19-91\%}. Our research contests the recent study that direct forecasting outperforms rolling forecasting for S2S tasks. Through theoretical analysis, we propose that the underperformance of rolling forecasting may arise from the accumulation of Jacobian matrix products during training. Our multi-stage framework can be viewed as a form of teacher forcing to address this issue. Code is available at \url{https://anonymous.4open.science/r/Baguan-S2S-23E7/}

Paper Structure

This paper contains 25 sections, 17 equations, 23 figures, 1 table.

Figures (23)

  • Figure 1: Left: comparison of the training processes of the multi-stage method with the naive method. The naive method, while increasing $T$, exhibits a significant discrepancy in the model state, leads to gradient divergence. Our method gradually increases $T$ while simultaneously reducing the discrepancy in model states at each stage, thereby lowering the training difficulty. Right: training stability comparison between naive training method and our multi-stage training method.
  • Figure 2: Principle of multi-stage progressive learning. Left: structure of our base model Baguanbuguanwebsite. Baguan employs a Siamese MAE methodgupta2023siamese to pre-train a ViT-structured weather forecasting model. Right: Demonstration of the Teacher Forcinghess2023generalized. Teacher forcing substitutes the observed data values $X_t$ for the predicted values $\hat{X}_t$, thereby interrupting the multiplicative path of $J$. We gradually extend $T$ by controlling the frequency of teacher forcing, therefore enabling long-term rolling optimization of the model.
  • Figure 3: Comparison between direct prediction and different rolling prediction. In direct prediction, a separate model was employed for each lead time. The naive rolling method loses the model's predictive capability during long-term rolling, whereas our optimized approach effectively addresses this issue.
  • Figure 4: Performance of multi-stage training compared with ideal rolling prediction. We use CKA to calculate the similarity between the model features $F_{\theta,feat}(X_{41})$ and model features obtained after rolling predicting 41 times with data from day 0 $F_{\theta,feat}(F_{\theta}^{(41)}(X_{0}))$. In an ideal rolling prediction, the two features should be perfectly aligned, which means that CKA equals 1. A higher CKA indicates stronger information continuity, allowing the model to achieve better performance in long-term rolling predictions.
  • Figure 5: PCC comparsion between ECMWF-S2S and Our model. Our model surpassed the state-of-the-art NWP systems (ECMWF-S2S) by over 19-91%.
  • ...and 18 more figures