Increasing NWP Thunderstorm Predictability Using Ensemble Data and Machine Learning
Kianusch Vahid Yousefnia, Tobias Bölle, Christoph Metzl
TL;DR
The paper investigates how ensemble NWP data and ML can enhance thunderstorm forecasts, introducing SALAMA 1D and SALAMA 1D-EPS applied to ICON-D2-EPS forecasts. It derives an analytic expression for the improvement in the Brier Skill Score due to ensemble averaging, showing that inter-member correlations (γ) limit gains and that larger, less-correlated ensembles yield bigger benefits. Empirically, SALAMA 1D-EPS outperforms the single-member model across lead times, with an 11-hour ensemble achieving similar skill to a 5-hour deterministic forecast, and ML-based patterns remaining predictable longer than raw NWP. These findings advocate ensemble harvesting and ML postprocessing as practical routes to improve convection forecasts and potentially other ensemble-based binary classifications in meteorology.
Abstract
While numerical weather prediction (NWP) models are essential for forecasting thunderstorms hours in advance, NWP uncertainty, which increases with lead time, limits the predictability of thunderstorm occurrence. This study investigates how ensemble NWP data and machine learning (ML) can enhance the skill of thunderstorm forecasts. Using our recently introduced neural network model, SALAMA 1D, which identifies thunderstorm occurrence in operational forecasts of the convection-permitting ICON-D2-EPS model for Central Europe, we demonstrate that ensemble-averaging significantly improves forecast skill. Notably, an 11-hour ensemble forecast matches the skill level of a 5-hour deterministic forecast. To explain this improvement, we derive an analytic expression linking skill differences to correlations between ensemble members, which aligns with observed performance gains. This expression generalizes to any binary classification model that processes ensemble members individually. Additionally, we show that ML models like SALAMA 1D can identify patterns of thunderstorm occurrence which remain predictable for longer lead times compared to raw NWP output. Our findings quantitatively explain the benefits of ensemble-averaging and encourage the development of ML methods for thunderstorm forecasting and beyond.
