Table of Contents
Fetching ...

Beyond Ensemble Averages: Leveraging Climate Model Ensembles for Subseasonal Forecasting

Elena Orlova, Haokun Liu, Raphael Rossellini, Benjamin A. Cash, Rebecca Willett

TL;DR

This work tackles subseasonal forecasting by treating lagged physics-based ensemble forecasts as rich features for ML post-processing. By incorporating the full ensemble (not just the mean), lagged observations, SST PCs, and spatial encodings within LR, RF, and U-Net architectures, and by employing nonlinear stacking, the authors systematically improve predictions of monthly precipitation and 2 m temperature two weeks ahead over the CONUS. Across regression, quantile, and tercile tasks, the stacked models consistently outperform climatology and ensemble-mean baselines, with notable gains in temperature forecasts and extreme-event prediction. The study demonstrates the value of ensemble diversity and spatial information for SSF and outlines promising future directions such as transformers and uncertainty quantification to further enhance operational relevance.

Abstract

Producing high-quality forecasts of key climate variables, such as temperature and precipitation, on subseasonal time scales has long been a gap in operational forecasting. This study explores an application of machine learning (ML) models as post-processing tools for subseasonal forecasting. Lagged numerical ensemble forecasts (i.e., an ensemble where the members have different initialization dates) and observational data, including relative humidity, pressure at sea level, and geopotential height, are incorporated into various ML methods to predict monthly average precipitation and two-meter temperature two weeks in advance for the continental United States. For regression, quantile regression, and tercile classification tasks, we consider using linear models, random forests, convolutional neural networks, and stacked models (a multi-model approach based on the prediction of the individual ML models). Unlike previous ML approaches that often use ensemble mean alone, we leverage information embedded in the ensemble forecasts to enhance prediction accuracy. Additionally, we investigate extreme event predictions that are crucial for planning and mitigation efforts. Considering ensemble members as a collection of spatial forecasts, we explore different approaches to using spatial information. Trade-offs between different approaches may be mitigated with model stacking. Our proposed models outperform standard baselines such as climatological forecasts and ensemble means. In addition, we investigate feature importance, trade-offs between using the full ensemble or only the ensemble mean, and different modes of accounting for spatial variability.

Beyond Ensemble Averages: Leveraging Climate Model Ensembles for Subseasonal Forecasting

TL;DR

This work tackles subseasonal forecasting by treating lagged physics-based ensemble forecasts as rich features for ML post-processing. By incorporating the full ensemble (not just the mean), lagged observations, SST PCs, and spatial encodings within LR, RF, and U-Net architectures, and by employing nonlinear stacking, the authors systematically improve predictions of monthly precipitation and 2 m temperature two weeks ahead over the CONUS. Across regression, quantile, and tercile tasks, the stacked models consistently outperform climatology and ensemble-mean baselines, with notable gains in temperature forecasts and extreme-event prediction. The study demonstrates the value of ensemble diversity and spatial information for SSF and outlines promising future directions such as transformers and uncertainty quantification to further enhance operational relevance.

Abstract

Producing high-quality forecasts of key climate variables, such as temperature and precipitation, on subseasonal time scales has long been a gap in operational forecasting. This study explores an application of machine learning (ML) models as post-processing tools for subseasonal forecasting. Lagged numerical ensemble forecasts (i.e., an ensemble where the members have different initialization dates) and observational data, including relative humidity, pressure at sea level, and geopotential height, are incorporated into various ML methods to predict monthly average precipitation and two-meter temperature two weeks in advance for the continental United States. For regression, quantile regression, and tercile classification tasks, we consider using linear models, random forests, convolutional neural networks, and stacked models (a multi-model approach based on the prediction of the individual ML models). Unlike previous ML approaches that often use ensemble mean alone, we leverage information embedded in the ensemble forecasts to enhance prediction accuracy. Additionally, we investigate extreme event predictions that are crucial for planning and mitigation efforts. Considering ensemble members as a collection of spatial forecasts, we explore different approaches to using spatial information. Trade-offs between different approaches may be mitigated with model stacking. Our proposed models outperform standard baselines such as climatological forecasts and ensemble means. In addition, we investigate feature importance, trade-offs between using the full ensemble or only the ensemble mean, and different modes of accounting for spatial variability.
Paper Structure (63 sections, 17 equations, 25 figures, 20 tables)

This paper contains 63 sections, 17 equations, 25 figures, 20 tables.

Figures (25)

  • Figure 1: An illustration of different forecasting paradigms: (a) spatial independence models with a model for each spatial location, no accounting for spatial information; (b) conditional spatial independence models with one model for all locations, might consider the spatial information; (c) spatial dependence models that account for the spatial information by design. We replace “precipitation” in the illustration with “temperature” for temperature prediction, but the overall structure remains the same.
  • Figure 2: $R^2$ score heatmaps of baselines and learning-based methods for precipitation regression using NCEP-CFSv2 ensemble members; errors are computed over the test period. Positive values (blue) indicate better performance. See Section \ref{['sec:precip_regr_ncep']} for details.
  • Figure 3: $R^2$ score heatmaps of baselines and learning-based methods for temperature regression using NCEP-CFSv2 ensemble members; errors are computed over the test period. Positive values (blue) indicate better performance. See Section \ref{['sec:tmp_regr_ncep']} for details.
  • Figure 4: Test quantile loss heatmaps of baselines and learning-based methods for temperature quantile regression using NCEP-CFSv2 dataset. Blue regions indicate smaller quantile loss. See Section \ref{['sec:qregr_ncep_tmp']} for details.
  • Figure 5: Precipitation regression test $R^2$ heatmaps of LR, U-Net, RF, and stacked model trained using ensemble mean only, using sorted and shuffled ensemble, or using the full ensemble. The NCEP-CFSv2 ensemble is used. See Section \ref{['sec:ens_members_analysis']} for details.
  • ...and 20 more figures