Table of Contents
Fetching ...

Increasing NWP Thunderstorm Predictability Using Ensemble Data and Machine Learning

Kianusch Vahid Yousefnia, Tobias Bölle, Christoph Metzl

TL;DR

The paper investigates how ensemble NWP data and ML can enhance thunderstorm forecasts, introducing SALAMA 1D and SALAMA 1D-EPS applied to ICON-D2-EPS forecasts. It derives an analytic expression for the improvement in the Brier Skill Score due to ensemble averaging, showing that inter-member correlations (γ) limit gains and that larger, less-correlated ensembles yield bigger benefits. Empirically, SALAMA 1D-EPS outperforms the single-member model across lead times, with an 11-hour ensemble achieving similar skill to a 5-hour deterministic forecast, and ML-based patterns remaining predictable longer than raw NWP. These findings advocate ensemble harvesting and ML postprocessing as practical routes to improve convection forecasts and potentially other ensemble-based binary classifications in meteorology.

Abstract

While numerical weather prediction (NWP) models are essential for forecasting thunderstorms hours in advance, NWP uncertainty, which increases with lead time, limits the predictability of thunderstorm occurrence. This study investigates how ensemble NWP data and machine learning (ML) can enhance the skill of thunderstorm forecasts. Using our recently introduced neural network model, SALAMA 1D, which identifies thunderstorm occurrence in operational forecasts of the convection-permitting ICON-D2-EPS model for Central Europe, we demonstrate that ensemble-averaging significantly improves forecast skill. Notably, an 11-hour ensemble forecast matches the skill level of a 5-hour deterministic forecast. To explain this improvement, we derive an analytic expression linking skill differences to correlations between ensemble members, which aligns with observed performance gains. This expression generalizes to any binary classification model that processes ensemble members individually. Additionally, we show that ML models like SALAMA 1D can identify patterns of thunderstorm occurrence which remain predictable for longer lead times compared to raw NWP output. Our findings quantitatively explain the benefits of ensemble-averaging and encourage the development of ML methods for thunderstorm forecasting and beyond.

Increasing NWP Thunderstorm Predictability Using Ensemble Data and Machine Learning

TL;DR

The paper investigates how ensemble NWP data and ML can enhance thunderstorm forecasts, introducing SALAMA 1D and SALAMA 1D-EPS applied to ICON-D2-EPS forecasts. It derives an analytic expression for the improvement in the Brier Skill Score due to ensemble averaging, showing that inter-member correlations (γ) limit gains and that larger, less-correlated ensembles yield bigger benefits. Empirically, SALAMA 1D-EPS outperforms the single-member model across lead times, with an 11-hour ensemble achieving similar skill to a 5-hour deterministic forecast, and ML-based patterns remaining predictable longer than raw NWP. These findings advocate ensemble harvesting and ML postprocessing as practical routes to improve convection forecasts and potentially other ensemble-based binary classifications in meteorology.

Abstract

While numerical weather prediction (NWP) models are essential for forecasting thunderstorms hours in advance, NWP uncertainty, which increases with lead time, limits the predictability of thunderstorm occurrence. This study investigates how ensemble NWP data and machine learning (ML) can enhance the skill of thunderstorm forecasts. Using our recently introduced neural network model, SALAMA 1D, which identifies thunderstorm occurrence in operational forecasts of the convection-permitting ICON-D2-EPS model for Central Europe, we demonstrate that ensemble-averaging significantly improves forecast skill. Notably, an 11-hour ensemble forecast matches the skill level of a 5-hour deterministic forecast. To explain this improvement, we derive an analytic expression linking skill differences to correlations between ensemble members, which aligns with observed performance gains. This expression generalizes to any binary classification model that processes ensemble members individually. Additionally, we show that ML models like SALAMA 1D can identify patterns of thunderstorm occurrence which remain predictable for longer lead times compared to raw NWP output. Our findings quantitatively explain the benefits of ensemble-averaging and encourage the development of ML methods for thunderstorm forecasting and beyond.

Paper Structure

This paper contains 9 sections, 12 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Study region for this work. The polygon vertices, counterclockwise from the bottom-left, read: (44.7N, 1.2E), (44.7N, 15.8E), (56.3N, 17.8E), (56.3N, 1.8W)
  • Figure 2: Reliability diagrams and bin-wise reliability and resolution for SALAMA 1D (upper panels) and SALAMA 1D-EPS (lower panels) for the lead times 0h (left), 4h (middle), 8h (right). Shaded bands around the calibration functions denote uncertainties on a symmetric 90% confidence interval. Uncertainties are obtained from e4 block bootstrap resamples, with day-wise block resampling.
  • Figure 3: Lead-time dependence of skill, quantified by the Brier skill score (BSS), of single-member forecasts (SALAMA 1D) and ensemble forecasts (SALAMA 1D-EPS) of thunderstorm occurrence. The lower panel shows the difference in skill, together with the prediction from the analytic expression \ref{['eq:diff_BSS']}. Shaded bands correspond to sampling uncertainty for a symmetric 90% confidence interval. Uncertainties are obtained from e4 block bootstrap resamples, with day-wise block-resampling.
  • Figure 4: Sample covariance matrix $\text{Cov}[p^{(k)}, p^{(l)}]/e-3$ of the member-wise probabilities $p^{(k)}, k = 1,\dots,N_\text{e}$, estimated for the test set of 0h lead time (left) and 4h lead time (right). If the members of the ensemble are exchangeable, the covariance matrix is fully determined by two numbers (one number for the diagonal entries of the matrix, one number for the off-diagonal entries), which is approximately the case.
  • Figure 5: Lead-time dependence of skill, quantified by the calibration-blind skill score RES (\ref{['eq:res']}) for deterministic forecasts (left panel) and ensemble-averaged forecasts (right panel). Each panel displays the results for SALAMA 1D and a simple model based on raw NWP output without any ML corrections. For each line, we fit an exponential function $\propto\exp{(-t_\text{lead}/\tau)}$ to introduce a characteristic time scale $\tau$ of skill decay. Across all lines, the skill of ML-based forecasts decays more slowly than raw NWP forecasts, as $\Delta \tau \equiv \tau (\text{ML})-\tau (\text{raw NWP}) > 0$. Shaded bands correspond to sampling uncertainty for a symmetric 90% confidence interval. Uncertainties are obtained from e4 block bootstrap resamples with day-wise block-resampling.