Table of Contents
Fetching ...

Diffusion Models Bridge Deep Learning and Physics in ENSO Forecasting

Weifeng Xu, Xiang Zhu, Xiaoyong Li, Qiang Yao, Xiaoli Ren, Kefeng Deng, Song Wu, Chengcheng Shao, Xiaolong Xu, Juan Zhao, Chengwu Zhao, Jianping Cao, Jingnan Wang, Wuxin Wang, Qixiu Li, Xiaori Gao, Xinrong Wu, Huizan Wang, Xiaoqun Cao, Weiming Zhang, Junqiang Song, Kaijun Ren

TL;DR

The paper introduces a conditional diffusion model for ENSO forecasting that treats future SST as a probabilistic distribution conditioned on six months of history, enabling explicit uncertainty quantification and long-range forecasts. Through a physics-guided reverse-time SDE, the model uncovers a mechanistic link to the recharge-discharge ENSO oscillator, consistent with the Van der Pol framework, thereby marrying data-driven prediction with deterministic dynamics. Key findings include extended lead times up to ~26–30 months with competitive skill, improved 21st-century forecasts via observation-based training, and the ability to reproduce extreme events and early SPB signals through ensemble uncertainty. This work offers a transferable, interpretable probabilistic forecasting paradigm for complex geophysical systems and demonstrates how diffusion models can encode fundamental physical processes while delivering practical predictive performance.

Abstract

Accurate long-range forecasting of the El \Nino-Southern Oscillation (ENSO) is vital for global climate prediction and disaster risk management. Yet, limited understanding of ENSO's physical mechanisms constrains both numerical and deep learning approaches, which often struggle to balance predictive accuracy with physical interpretability. Here, we introduce a data driven model for ENSO prediction based on conditional diffusion model. By constructing a probabilistic mapping from historical to future states using higher-order Markov chain, our model explicitly quantifies intrinsic uncertainty. The approach achieves extending lead times of state-of-the-art methods, resolving early development signals of the spring predictability barrier, and faithfully reproducing the spatiotemporal evolution of historical extreme events. The most striking implication is that our analysis reveals that the reverse diffusion process inherently encodes the classical recharge-discharge mechanism, with its operational dynamics exhibiting remarkable consistency with the governing principles of the van der Pol oscillator equation. These findings establish diffusion models as a new paradigm for ENSO forecasting, offering not only superior probabilistic skill but also a physically grounded theoretical framework that bridges data-driven prediction with deterministic dynamical systems, thereby advancing the study of complex geophysical processes.

Diffusion Models Bridge Deep Learning and Physics in ENSO Forecasting

TL;DR

The paper introduces a conditional diffusion model for ENSO forecasting that treats future SST as a probabilistic distribution conditioned on six months of history, enabling explicit uncertainty quantification and long-range forecasts. Through a physics-guided reverse-time SDE, the model uncovers a mechanistic link to the recharge-discharge ENSO oscillator, consistent with the Van der Pol framework, thereby marrying data-driven prediction with deterministic dynamics. Key findings include extended lead times up to ~26–30 months with competitive skill, improved 21st-century forecasts via observation-based training, and the ability to reproduce extreme events and early SPB signals through ensemble uncertainty. This work offers a transferable, interpretable probabilistic forecasting paradigm for complex geophysical systems and demonstrates how diffusion models can encode fundamental physical processes while delivering practical predictive performance.

Abstract

Accurate long-range forecasting of the El \Nino-Southern Oscillation (ENSO) is vital for global climate prediction and disaster risk management. Yet, limited understanding of ENSO's physical mechanisms constrains both numerical and deep learning approaches, which often struggle to balance predictive accuracy with physical interpretability. Here, we introduce a data driven model for ENSO prediction based on conditional diffusion model. By constructing a probabilistic mapping from historical to future states using higher-order Markov chain, our model explicitly quantifies intrinsic uncertainty. The approach achieves extending lead times of state-of-the-art methods, resolving early development signals of the spring predictability barrier, and faithfully reproducing the spatiotemporal evolution of historical extreme events. The most striking implication is that our analysis reveals that the reverse diffusion process inherently encodes the classical recharge-discharge mechanism, with its operational dynamics exhibiting remarkable consistency with the governing principles of the van der Pol oscillator equation. These findings establish diffusion models as a new paradigm for ENSO forecasting, offering not only superior probabilistic skill but also a physically grounded theoretical framework that bridges data-driven prediction with deterministic dynamical systems, thereby advancing the study of complex geophysical processes.

Paper Structure

This paper contains 12 sections, 18 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Triadic Linkage of Recharge-Discharge Physics, Chaotic Dynamics, and Diffusion Reverse-Time SDE for Interpretable ENSO Prediction.
  • Figure 2: Performance of the diffusion model in 30-month-ahead ENSO prediction from 1980 to 2021.A-B: Average evolution of evaluation metrics across lead times (1-30 months) during 1980-2021. Dots mark skill scores; filled dots indicate values exceeding 0.5. Squares denote the mean absolute error (MAE) of SSTA; filled squares correspond to MAE below 0.5$^{\circ}$C. C: Comparison of Niño3.4 index time series between the IAPV4 dataset and NOAA observations from 1980 to 2021 to test whether IAPV4 reproduces the observed ENSO amplitude phase and decadal swings from 1980 to 2021. The two curves show strong agreement with an overall correlation coefficient of 0.9617. D (20th century) and E (21st century): Heat-maps of absolute error for every Niño3.4 sample. The x-axis indicates lead time, and the y-axis represents prediction start time. F (20th century) and G (21st century): Heatmap of correlation coefficients for all samples. The x-axis indicates lead time, and the y-axis represents prediction start time.
  • Figure 3: Prediction Performance Comparison Between Pre-trained and Non-Pre-trained Models (2000-2023)A-C: Average evolution of evaluation metrics across lead times (1-30 months) during 2000-2023. A filled dots mark correlation coefficients $>$ 0.5. B filled dots indicate the maximum absolute Niño3.4 index error in each forecast. C filled dots denote the maximum spatial correlation coefficient achieved in each prediction. D: Difference (without-CMIP6 minus with-CMIP6) for each metric, highlighting the impact of CMIP6 training data.
  • Figure 4: Model Performance under the Spring Predictability Barrier for ENSO. Red shading marks spring.A-E: Hindcast skill from January-June 2000 initial conditions for the period July 2000-July 2010, $n$ denotes the ensemble size. F-Q: Distributions of the monthly Niño3.4 index produced with 100-member ensembles (n = 100). Histograms and probability-density curves are obtained by kernel-density estimation (bandwidth = 0.2). R: Kurtosis for July 2000-June 2003. The blue line shows the spring-average kurtosis, red symbols indicate values for the February immediately preceding the Spring Predictability Barrier.
  • Figure 5: Forecast results for the extreme cases.A for 2015/16 super El Niño, B for 2020-22 triple La Niña; A.1&B.1 show the predicted Niño3.4 index, A.2-3&B.2-3 display the equatorial spatio-temporal evolution of SST anomalies.
  • ...and 4 more figures