Table of Contents
Fetching ...

Do AI models predict storm impacts as accurately as physics-based models? A case study of the February 2020 storm series over the North Atlantic

Hilla Afargan-Gerstman, Rachel W. -Y. Wu, Alice Ferrini, Daniela I. V. Domeisen

TL;DR

This study benchmarks data-driven AI forecasts against physics-based forecasts for a sequential extratropical cyclone event over the UK in February 2020, using WeatherBench 2 data. It focuses on mean sea level pressure and 10-m wind anomalies to assess both storm evolution and potential impact warnings, revealing that AI models like GraphCast and Pangu-Weather can match or exceed the ECMWF ensemble mean in wind-impacts on weekly timescales, while physics-model ensemble members can still outperform AI in some cases. A key finding is that AI forecasts exhibit weaker physical consistency, evidenced by reduced error correlations between storm intensity and surface winds, underscoring the value of hybrid approaches that fuse data-driven and physics-based insights. The work highlights practical implications for impact forecasting and preparedness, suggesting ensemble sub-selection and hybrid frameworks to leverage strengths from both paradigms while calling for broader systematic validation across diverse extreme-event scenarios.

Abstract

The emergence of data-driven weather forecast models provides great promise for producing faster, computationally cheaper weather forecasts, compared to physics-based numerical models. However, while the performance of artificial intelligence (AI) models have been evaluated primarily for average conditions and single extreme weather events, less is known about their capability to capture sequences of extreme events, states that are usually accompanied by multiple hazards. The storm series in February 2020 provides a prime example to evaluate the performance of AI models for storm impacts. This event was associated with high surface impacts including intense surface wind speeds and heavy precipitation, amplified regionally due to the close succession of three extratropical storms. In this study, we compare the performance of data-driven models to physics-based models in forecasting the February 2020 storm series over the United Kingdom. We show that on weekly timescales, AI models tend to outperform the numerical model in predicting mean sea level pressure (MSLP), and, to a lesser extent, surface winds. Nevertheless, certain ensemble members within the physics-based forecast system can perform as well as, or occasionally outperform, the AI models. Moreover, weaker error correlations between atmospheric variables suggest that AI models may overlook physical constraints. This analysis helps to identify gaps and limitations in the ability of data-driven models to be used for impact warnings, and emphasizes the need to integrate such models with physics-based approaches for reliable impact forecasting.

Do AI models predict storm impacts as accurately as physics-based models? A case study of the February 2020 storm series over the North Atlantic

TL;DR

This study benchmarks data-driven AI forecasts against physics-based forecasts for a sequential extratropical cyclone event over the UK in February 2020, using WeatherBench 2 data. It focuses on mean sea level pressure and 10-m wind anomalies to assess both storm evolution and potential impact warnings, revealing that AI models like GraphCast and Pangu-Weather can match or exceed the ECMWF ensemble mean in wind-impacts on weekly timescales, while physics-model ensemble members can still outperform AI in some cases. A key finding is that AI forecasts exhibit weaker physical consistency, evidenced by reduced error correlations between storm intensity and surface winds, underscoring the value of hybrid approaches that fuse data-driven and physics-based insights. The work highlights practical implications for impact forecasting and preparedness, suggesting ensemble sub-selection and hybrid frameworks to leverage strengths from both paradigms while calling for broader systematic validation across diverse extreme-event scenarios.

Abstract

The emergence of data-driven weather forecast models provides great promise for producing faster, computationally cheaper weather forecasts, compared to physics-based numerical models. However, while the performance of artificial intelligence (AI) models have been evaluated primarily for average conditions and single extreme weather events, less is known about their capability to capture sequences of extreme events, states that are usually accompanied by multiple hazards. The storm series in February 2020 provides a prime example to evaluate the performance of AI models for storm impacts. This event was associated with high surface impacts including intense surface wind speeds and heavy precipitation, amplified regionally due to the close succession of three extratropical storms. In this study, we compare the performance of data-driven models to physics-based models in forecasting the February 2020 storm series over the United Kingdom. We show that on weekly timescales, AI models tend to outperform the numerical model in predicting mean sea level pressure (MSLP), and, to a lesser extent, surface winds. Nevertheless, certain ensemble members within the physics-based forecast system can perform as well as, or occasionally outperform, the AI models. Moreover, weaker error correlations between atmospheric variables suggest that AI models may overlook physical constraints. This analysis helps to identify gaps and limitations in the ability of data-driven models to be used for impact warnings, and emphasizes the need to integrate such models with physics-based approaches for reliable impact forecasting.

Paper Structure

This paper contains 10 sections, 1 equation, 5 figures, 1 table.

Figures (5)

  • Figure 1: (a) Time series of MSLP (dashed grey line) and 10-meter wind speed (solid blue line) over the UK (12$^\circ$W-5$^\circ$E, 48$^\circ$N-60$^\circ$N) during the storm series in February 2022. (b) Trajectories of the three storms (Ciara, Dennis and Jorge) over the North Atlantic and Western Europe, based on daily minimum MSLP anomalies computed relative to daily 30-year climatology (see the Methods section for details).
  • Figure 2: MSLP anomalies (shading) over the North Atlantic and Western Europe (20 - 80$^\circ$N, 60$^\circ$W - 20$^\circ$E) for the days of peak intensity on 9 February (upper row), 16 February (middle row) and 29 February (bottom row). Data are derived from (a) ERA5 reanalysis, (b) IFS ENS mean, (c) GraphCast, and (d) Pangu-Weather weather. Anomalies are computed relative to the 1990–2019 daily climatology in ERA5.
  • Figure 3: Same as Fig. \ref{['fig:wind_anom_all']}, but for 10m wind speed (shading).
  • Figure 4: Predicted time series of (a) MSLP and (b) 10-m wind averaged over the UK for the forecast models for the three initialization dates: 1 February (for Storm Ciara), 8 February (Storm Dennis), and 21 February (Storm Jorge). Forecasts are plotted for three models: the physics-based model IFS (blue solid line) and two AI models: Graphcast (orange) and Pangu (green). IFS ensemble members are plotted in solid grey lines and the ensemble mean is plotted in dashed blue. The best member of IFS in each initialization is highlighted in solid blue line. The ERA5 is plotted in dotted grey line.
  • Figure 5: (a) Scatter plot of daily values of maximum 10m wind speed anomaly (averaged over the UK; Fig \ref{['fig:timeseries_fig1']}) and minimum MSLP anomaly (averaged over the Euro-Atlantic region), averaged across all initializations. Each dataset is represented by a different color: grey for ERA5 observations, blue for IFS ENS mean, orange for GraphCast, and green for Pangu-Weather. A linear regression line is fitted to each dataset, and the corresponding regression coefficient (r). (b) Same as panel (a), but for the relationship between maximum 10m wind speed bias vs. minimum MSLP bias.