Table of Contents
Fetching ...

Generative machine learning methods for multivariate ensemble post-processing

Jieyu Chen, Tim Janke, Florian Steinke, Sebastian Lerch

TL;DR

The paper tackles the challenge of modeling multivariate dependencies in ensemble weather forecasts, a task poorly served by traditional two-step copula-based post-processing. It introduces a conditional generative model (CGM) that directly outputs multivariate forecast samples conditioned on exogenous predictors, trained via the energy score to avoid parametric assumptions. Across two German case studies (temperature and wind speed), CGM consistently outperforms state-of-the-art two-step methods (EMOS/DRN with ECC/GCA), especially in representing spatial dependencies, and benefits from larger sample sizes. This nonparametric, flexible framework enhances multivariate probabilistic forecasting and offers practical avenues for incorporating diverse predictors and higher-dimensional dependencies, with publicly available code.

Abstract

Ensemble weather forecasts based on multiple runs of numerical weather prediction models typically show systematic errors and require post-processing to obtain reliable forecasts. Accurately modeling multivariate dependencies is crucial in many practical applications, and various approaches to multivariate post-processing have been proposed where ensemble predictions are first post-processed separately in each margin and multivariate dependencies are then restored via copulas. These two-step methods share common key limitations, in particular the difficulty to include additional predictors in modeling the dependencies. We propose a novel multivariate post-processing method based on generative machine learning to address these challenges. In this new class of nonparametric data-driven distributional regression models, samples from the multivariate forecast distribution are directly obtained as output of a generative neural network. The generative model is trained by optimizing a proper scoring rule which measures the discrepancy between the generated and observed data, conditional on exogenous input variables. Our method does not require parametric assumptions on univariate distributions or multivariate dependencies and allows for incorporating arbitrary predictors. In two case studies on multivariate temperature and wind speed forecasting at weather stations over Germany, our generative model shows significant improvements over state-of-the-art methods and particularly improves the representation of spatial dependencies.

Generative machine learning methods for multivariate ensemble post-processing

TL;DR

The paper tackles the challenge of modeling multivariate dependencies in ensemble weather forecasts, a task poorly served by traditional two-step copula-based post-processing. It introduces a conditional generative model (CGM) that directly outputs multivariate forecast samples conditioned on exogenous predictors, trained via the energy score to avoid parametric assumptions. Across two German case studies (temperature and wind speed), CGM consistently outperforms state-of-the-art two-step methods (EMOS/DRN with ECC/GCA), especially in representing spatial dependencies, and benefits from larger sample sizes. This nonparametric, flexible framework enhances multivariate probabilistic forecasting and offers practical avenues for incorporating diverse predictors and higher-dimensional dependencies, with publicly available code.

Abstract

Ensemble weather forecasts based on multiple runs of numerical weather prediction models typically show systematic errors and require post-processing to obtain reliable forecasts. Accurately modeling multivariate dependencies is crucial in many practical applications, and various approaches to multivariate post-processing have been proposed where ensemble predictions are first post-processed separately in each margin and multivariate dependencies are then restored via copulas. These two-step methods share common key limitations, in particular the difficulty to include additional predictors in modeling the dependencies. We propose a novel multivariate post-processing method based on generative machine learning to address these challenges. In this new class of nonparametric data-driven distributional regression models, samples from the multivariate forecast distribution are directly obtained as output of a generative neural network. The generative model is trained by optimizing a proper scoring rule which measures the discrepancy between the generated and observed data, conditional on exogenous input variables. Our method does not require parametric assumptions on univariate distributions or multivariate dependencies and allows for incorporating arbitrary predictors. In two case studies on multivariate temperature and wind speed forecasting at weather stations over Germany, our generative model shows significant improvements over state-of-the-art methods and particularly improves the representation of spatial dependencies.
Paper Structure (23 sections, 26 equations, 8 figures, 3 tables, 1 algorithm)

This paper contains 23 sections, 26 equations, 8 figures, 3 tables, 1 algorithm.

Figures (8)

  • Figure 1: Locations of weather stations with (a) temperature and (b) wind speed observations.
  • Figure 2: Schematic illustration of the conditional generative model. The dimensions of the tensors at each step are indicated in the small box.
  • Figure 3: Boxplots of mean CRPS values of different multivariate post-processing methods with $D=5$, including the scores of raw ensemble forecasts. The scores are based on 242 unique stations in case of temperature, and 178 unique stations in case of wind speed.
  • Figure 4: Boxplots of (a) energy skill scores and (b) variogram skill scores of different multivariate post-processing methods for temperature across the 100 repetitions of the experiment with different sets of stations. EMOS+ECC is used as reference forecast in both cases.
  • Figure 5: As Figure \ref{['fig_tem_mvscores']}, but for wind speed.
  • ...and 3 more figures