Table of Contents
Fetching ...

Developing Machine Learning-Based Watch-to-Warning Severe Weather Guidance from the Warn-on-Forecast System

Montgomery Flora, Samuel Varga, Corey Potvin, Noah Lang

Abstract

While machine learning (ML) post-processing of convection-allowing model (CAM) output for severe weather hazards (large hail, damaging winds, and/or tornadoes) has shown promise for very short lead times (0-3 hours), its application to slightly longer forecast windows remains relatively underexplored. In this study, we develop and evaluate a grid-based ML framework to predict the probability of severe weather hazards over the next 2-6 hours using forecast output from the Warn-on-Forecast System (WoFS). Our dataset includes WoFS ensemble forecasts valid every 5 minutes out to 6 hours from 108 days during the 2019--2023 NOAA Hazardous Weather Testbed Spring Forecasting Experiments. We train ML models to generate probabilistic forecasts of severe weather akin to Storm Prediction Center outlooks (i.e., likelihood of a tornado, severe wind, or severe hail event within 36 km of each point). We compare a histogram gradient-boosted tree (HGBT) model and a deep learning U-Net approach against a carefully calibrated baseline generated from 2-5 km updraft helicity. Results indicate that the HGBT and U-Net outperform the baseline, particularly at higher probability thresholds. The HGBT achieves the best performance metrics, but predicted probabilities cap at 60% while the U-net forecasts extend to 100%. Similar to previous studies, the U-Net produces spatially smoother guidance than the tree-based method. These findings add to the growing evidence of the effectiveness of ML-based CAM post-processing for providing short-term severe weather guidance.

Developing Machine Learning-Based Watch-to-Warning Severe Weather Guidance from the Warn-on-Forecast System

Abstract

While machine learning (ML) post-processing of convection-allowing model (CAM) output for severe weather hazards (large hail, damaging winds, and/or tornadoes) has shown promise for very short lead times (0-3 hours), its application to slightly longer forecast windows remains relatively underexplored. In this study, we develop and evaluate a grid-based ML framework to predict the probability of severe weather hazards over the next 2-6 hours using forecast output from the Warn-on-Forecast System (WoFS). Our dataset includes WoFS ensemble forecasts valid every 5 minutes out to 6 hours from 108 days during the 2019--2023 NOAA Hazardous Weather Testbed Spring Forecasting Experiments. We train ML models to generate probabilistic forecasts of severe weather akin to Storm Prediction Center outlooks (i.e., likelihood of a tornado, severe wind, or severe hail event within 36 km of each point). We compare a histogram gradient-boosted tree (HGBT) model and a deep learning U-Net approach against a carefully calibrated baseline generated from 2-5 km updraft helicity. Results indicate that the HGBT and U-Net outperform the baseline, particularly at higher probability thresholds. The HGBT achieves the best performance metrics, but predicted probabilities cap at 60% while the U-net forecasts extend to 100%. Similar to previous studies, the U-Net produces spatially smoother guidance than the tree-based method. These findings add to the growing evidence of the effectiveness of ML-based CAM post-processing for providing short-term severe weather guidance.
Paper Structure (13 sections, 3 equations, 7 figures, 3 tables)

This paper contains 13 sections, 3 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Heatmap showing the frequency of a 1$\degree$ × 1$\degree$ region within a 900 x 900 km WoFS domain during the 2019--2023 HWT-SFEs.
  • Figure 2: Illustration of the data pre-processing workflow for the tabular model inputs (left) and the U-net inputs (right). The data arrays sizes ($n$) for a single sample are given. The subscripts are $t$ = time, ($y,x$)= (latitude, longitude), $e$= ensemble, $v$ = variable/channel. The combined $xy$ indices indicate the common 2D array flattening to create tabular inputs. The ensemble statistics used are provided in the bottom table.
  • Figure 3: Illustration of the U-net architecture used in this study. Image and channel sizes are provided on each rectangle's lower left-hand side and top. A DoubleConv layer comprises 2D convolution, batch normalization, ReLU, dropout, 2D convolution, batch normalization, and then ReLU again. For skip connections, the output of a given encoder layer is passed directly to the corresponding decoding layer and concatenated to the UpConv layer input. Skip connections are only connected across the U-net to the level with a similar spatial dimension.
  • Figure 4: Heatmaps of the NMEP cross-validation (cv)-mean Brier Skill Score (BSS) for choice of 2--5 km UH threshold and neighborhood size. The BSS is calculated as the mean BSS across all validation folds using 5-fold cross-validation on the training set. The highest BSS is highlighted with a black outline.
  • Figure 5: Example forecasts from the U-Net (upper left), HGBT (upper right), and WoFS baseline NMEP (lower left). The guidance was issued for 12 May 2023 22-01 UTC from WoFS forecasts initialization at 20:00 UTC. SPC categorical outlooks issued at 1630 UTC the same day are overlaid as contours. The MRMS composite reflectivity at model initialization time is also shown. NCEI Storm Data reports of severe hail, severe wind, and tornadoes are shown in green, blue, and red markers, respectively.
  • ...and 2 more figures