Do AI models predict storm impacts as accurately as physics-based models? A case study of the February 2020 storm series over the North Atlantic
Hilla Afargan-Gerstman, Rachel W. -Y. Wu, Alice Ferrini, Daniela I. V. Domeisen
TL;DR
This study benchmarks data-driven AI forecasts against physics-based forecasts for a sequential extratropical cyclone event over the UK in February 2020, using WeatherBench 2 data. It focuses on mean sea level pressure and 10-m wind anomalies to assess both storm evolution and potential impact warnings, revealing that AI models like GraphCast and Pangu-Weather can match or exceed the ECMWF ensemble mean in wind-impacts on weekly timescales, while physics-model ensemble members can still outperform AI in some cases. A key finding is that AI forecasts exhibit weaker physical consistency, evidenced by reduced error correlations between storm intensity and surface winds, underscoring the value of hybrid approaches that fuse data-driven and physics-based insights. The work highlights practical implications for impact forecasting and preparedness, suggesting ensemble sub-selection and hybrid frameworks to leverage strengths from both paradigms while calling for broader systematic validation across diverse extreme-event scenarios.
Abstract
The emergence of data-driven weather forecast models provides great promise for producing faster, computationally cheaper weather forecasts, compared to physics-based numerical models. However, while the performance of artificial intelligence (AI) models have been evaluated primarily for average conditions and single extreme weather events, less is known about their capability to capture sequences of extreme events, states that are usually accompanied by multiple hazards. The storm series in February 2020 provides a prime example to evaluate the performance of AI models for storm impacts. This event was associated with high surface impacts including intense surface wind speeds and heavy precipitation, amplified regionally due to the close succession of three extratropical storms. In this study, we compare the performance of data-driven models to physics-based models in forecasting the February 2020 storm series over the United Kingdom. We show that on weekly timescales, AI models tend to outperform the numerical model in predicting mean sea level pressure (MSLP), and, to a lesser extent, surface winds. Nevertheless, certain ensemble members within the physics-based forecast system can perform as well as, or occasionally outperform, the AI models. Moreover, weaker error correlations between atmospheric variables suggest that AI models may overlook physical constraints. This analysis helps to identify gaps and limitations in the ability of data-driven models to be used for impact warnings, and emphasizes the need to integrate such models with physics-based approaches for reliable impact forecasting.
