A Benchmark Dataset for Tornado Detection and Prediction using Full-Resolution Polarimetric Weather Radar Data

Mark S. Veillette; James M. Kurdzo; Phillip M. Stepanian; John Y. N. Cho; Siddharth Samsi; Joseph McDonald

A Benchmark Dataset for Tornado Detection and Prediction using Full-Resolution Polarimetric Weather Radar Data

Mark S. Veillette, James M. Kurdzo, Phillip M. Stepanian, John Y. N. Cho, Siddharth Samsi, Joseph McDonald

TL;DR

TorNet provides a public, full-resolution benchmark for tornado detection and prediction using polarimetric radar data from 2013–2022, enabling fair comparisons across ML approaches. The authors develop and benchmark baselines including a novel CNN that operates on raw imagery with CoordConv, demonstrating superior performance over TVS and traditional ML models while offering calibration and full-scene inference capabilities. The work establishes a foundation for real-time tornado monitoring and prediction with rigorous CV-based evaluation, probabilistic calibration, and multi-view performance analyses. By releasing the dataset and code openly, the paper aims to accelerate ML-driven advances in tornado warning and prediction and to motivate future multi-modal fusion studies.

Abstract

Weather radar is the primary tool used by forecasters to detect and warn for tornadoes in near-real time. In order to assist forecasters in warning the public, several algorithms have been developed to automatically detect tornadic signatures in weather radar observations. Recently, Machine Learning (ML) algorithms, which learn directly from large amounts of labeled data, have been shown to be highly effective for this purpose. Since tornadoes are extremely rare events within the corpus of all available radar observations, the selection and design of training datasets for ML applications is critical for the performance, robustness, and ultimate acceptance of ML algorithms. This study introduces a new benchmark dataset, TorNet to support development of ML algorithms in tornado detection and prediction. TorNet contains full-resolution, polarimetric, Level-II WSR-88D data sampled from 10 years of reported storm events. A number of ML baselines for tornado detection are developed and compared, including a novel deep learning (DL) architecture capable of processing raw radar imagery without the need for manual feature extraction required for existing ML algorithms. Despite not benefiting from manual feature engineering or other preprocessing, the DL model shows increased detection performance compared to non-DL and operational baselines. The TorNet dataset, as well as source code and model weights of the DL baseline trained in this work, are made freely available.

A Benchmark Dataset for Tornado Detection and Prediction using Full-Resolution Polarimetric Weather Radar Data

TL;DR

Abstract

Paper Structure (21 sections, 15 figures, 2 tables)

This paper contains 21 sections, 15 figures, 2 tables.

Introduction
Dataset Description
Dataset Structure
Sample Categories
Event Selection
Radar Image Processing
Machine Learning Applications
Partitioning Data into Training and Testing
Baseline Models
Tornado Vortex Signature
Logistic Regression
Random Forest
Convolutional Neural Network (CNN)
Baseline Results
Metrics of Performance
...and 6 more sections

Figures (15)

Figure 1: One sample chip from the TorNet dataset. Each sample contains imagery of two elevation sweeps (0.5$^\circ$ left, 0.9$^\circ$, right) containing six radar variables: Reflectivity factor (DBZ), radial velocity (VEL), specific differential phase (KDP), correlation coefficient (RHOHV), differential reflectivity (ZDR), and spectrum width (WIDTH). Each sample contains radar variables provided on a 60$^\circ$ by 80-km region (in azimuth and range, respectively) centered over sampled locations and times. The sample shown depicts one time frame of an EF-3 tornado near KGWX on 23 February 2019 (NSED event ID 799239).
Figure 2: (left) Number of samples that were created for categories of confirmed tornadoes, non-tornadic tornado warnings, and non-tornadic random cells. (right) Counts of samples for each EF number.
Figure 3: Deep learning architecture used for the tornado detection. Multiple radar modalities shown on the top left are stacked along the channel dimension. Background values are flagged, and channels are normalized to the range [0--1]. The normalized data is processed by multiple VGG-style blocks that utilize CoodConv convolution layers which also ingest the radial coordinates of the radar chip. The output of the network is an image of tornado likelihoods, which is processed with a global max pooling layer before being compared to the known image labels in the loss function.
Figure 4: Results of the 5-fold cross validation for the CNN model. The plots show two metrics of classifier performance as a function of training epoch: training loss (top), and AUC (bottom). The thin and dashed lines represent the performance of different train/validation partitions, and the solid lines are the average over all folds. In this case, the optimal number of training epochs (10) was identified by finding the minimum of the training loss on the validation set, indicated in the top plot.
Figure 5: ROC curve (left) and performance diagram (right) for the entire test set (confirmed versus all nulls). In these plots, four baseline models are compared: TVS (which is represented as a single point in this space since it is a deterministic output), logistic regression, random forest, and CNN detectors. All ML models considerably outperform the TVS algorithm. In both the ROC and performance diagrams, the CNN model showed the greatest area under the curve (AUC).
...and 10 more figures

A Benchmark Dataset for Tornado Detection and Prediction using Full-Resolution Polarimetric Weather Radar Data

TL;DR

Abstract

A Benchmark Dataset for Tornado Detection and Prediction using Full-Resolution Polarimetric Weather Radar Data

Authors

TL;DR

Abstract

Table of Contents

Figures (15)