Table of Contents
Fetching ...

Improving the Predictability of the Madden-Julian Oscillation at Subseasonal Scales with Gaussian Process Models

Haoyuan Chen, Emil Constantinescu, Vishwas Rao, Cristiana Stan

TL;DR

This work develops a probabilistic, data-driven forecasting framework for the Madden–Julian Oscillation (MJO) using a jointly modeled Gaussian process (GP) on the bivariate MJO indices $[RMM1,RMM2]^T$. By leveraging empirical correlations to construct the GP and applying a posteriori covariance correction to accommodate iterative, multistep forecasts, the approach provides both point predictions and quantified uncertainty at subseasonal scales. The method yields improved deterministic skill relative to ANN models in the first few lead days and extends probabilistic coverage beyond $~$3 weeks, while delivering interpretable uncertainty through 2D confidence ellipsoids. Results indicate strong deterministic performance in early forecasts, positive phase-oriented skill across many MJO phases, and robust uncertainty quantification, with future work aiming to incorporate seasonality and additional predictors to further enhance performance and interpretability.

Abstract

The Madden--Julian Oscillation (MJO) is an influential climate phenomenon that plays a vital role in modulating global weather patterns. In spite of the improvement in MJO predictions made by machine learning algorithms, such as neural networks, most of them cannot provide the uncertainty levels in the MJO forecasts directly. To address this problem, we develop a nonparametric strategy based on Gaussian process (GP) models. We calibrate GPs using empirical correlations and we propose a posteriori covariance correction. Numerical experiments demonstrate that our model has better prediction skills than the ANN models for the first five lead days. Additionally, our posteriori covariance correction extends the probabilistic coverage by more than three weeks.

Improving the Predictability of the Madden-Julian Oscillation at Subseasonal Scales with Gaussian Process Models

TL;DR

This work develops a probabilistic, data-driven forecasting framework for the Madden–Julian Oscillation (MJO) using a jointly modeled Gaussian process (GP) on the bivariate MJO indices . By leveraging empirical correlations to construct the GP and applying a posteriori covariance correction to accommodate iterative, multistep forecasts, the approach provides both point predictions and quantified uncertainty at subseasonal scales. The method yields improved deterministic skill relative to ANN models in the first few lead days and extends probabilistic coverage beyond 3 weeks, while delivering interpretable uncertainty through 2D confidence ellipsoids. Results indicate strong deterministic performance in early forecasts, positive phase-oriented skill across many MJO phases, and robust uncertainty quantification, with future work aiming to incorporate seasonality and additional predictors to further enhance performance and interpretability.

Abstract

The Madden--Julian Oscillation (MJO) is an influential climate phenomenon that plays a vital role in modulating global weather patterns. In spite of the improvement in MJO predictions made by machine learning algorithms, such as neural networks, most of them cannot provide the uncertainty levels in the MJO forecasts directly. To address this problem, we develop a nonparametric strategy based on Gaussian process (GP) models. We calibrate GPs using empirical correlations and we propose a posteriori covariance correction. Numerical experiments demonstrate that our model has better prediction skills than the ANN models for the first five lead days. Additionally, our posteriori covariance correction extends the probabilistic coverage by more than three weeks.

Paper Structure

This paper contains 19 sections, 2 theorems, 29 equations, 7 figures, 2 tables.

Key Result

Lemma Appendix B.1

(Result 4.7 in Section 4.2 in johnson2002applied) Let $\mathcal{N}_p(\boldsymbol{\mu},\boldsymbol{\Sigma})$ denote a $p$-variate normal distribution with location $\boldsymbol{\mu}$ and known covariance $\boldsymbol{\Sigma}$. Let $\mathbf x \sim \mathcal{N}_p(\boldsymbol{\mu},\boldsymbol{\Sigma})$.

Figures (7)

  • Figure 1: Flowchart of the entire algorithm. Top: Diagram of the GP model for the MJO forecast. The blue arrows indicate the order of operations in the algorithm. $t^*$ represents the predicted timestamp, $\text{Bias}^2$ is the square of the bias between the predicted values and the true observations. Bottom: Iterated method for the multistep time series forecasting for two outputs with lag = $L$, lead time = $\tau$ ($\tau>L$). $z_t^{(1)}$, $z_t^{(2)}$ are the values of RMM1 and RMM2 at time $t$. The green arrows indicate one-day-ahead predictions. The red arrows indicate the moving window of the predictors. Including the predictions from the previous step as predictors in the current step is indicated by the pink arrow. See \ref{['sec:alg']} for more details.
  • Figure 2: Cross-correlations and auto-correlations of RMMs with maximum lag = 60 days.
  • Figure 3: Prediction skill quantifiers and errors of the GP model with lag $L=40$, $60$, respectively, compared to three models in the sub-seasonal to seasonal prediction project (S2S). Top: COR (higher is better), RMSE (lower is better), and phase error (degress) (lower is better) over 528 predictions. Bottom: Amplitude error (lower absolute is better), CRPS (lower is better), and log score (negative log-likelihood) (lower is better) over 528 predictions. Red lines and orange lines represent the GP model with lag $L=40$ and $L=60$ respectively, green lines represent the European Center for Medium-Range Weather Forecasts (ECMWF), blue lines represent the Bureau of Meteorology (BOM), purple lines represent the Centre National de Recherche Météorologiques (CNRM).
  • Figure 4: HSS heatmap for the GP model over 528 predictions with lag $L = 40$ (higher HSS is better). The cells with black cross marker "X" represent the significant samples from Fisher's exact test with the critical value $\alpha=0.05$.
  • Figure 5: Left: 60-days MJO phase diagram for Nov--03--2012 to Jan--01--2013 with lag $L = 40$. Black lines are observations (truth). Olive lines are predictions in November, and olive shadings are 68% confidence regions (CR) in November. Dark blue lines are predictions in December, and dark blue shadings are CR in December. Red lines are predictions in January, and red shadings are CR in January. Right: 60-days MJO phase diagram for Jan--14--2013 to Mar--14--2013 with lag $L = 40$. Black lines are observations (truth). Red lines are predictions in January, and red shadings are CR in January. Purple lines are predictions in February, and purple shadings are CI in February. Cyan lines are predictions in March, and cyan shadings are CR in March.
  • ...and 2 more figures

Theorems & Definitions (4)

  • Lemma Appendix B.1
  • proof
  • Lemma Appendix B.2
  • proof