On the Predictive Skill of Artificial Intelligence-based Weather Models for Extreme Events using Uncertainty Quantification
Rodrigo Almeida, Noelia Otero, Miguel-Ángel Fernández-Torres, Jackie Ma
TL;DR
AI-based weather forecasting struggles with uncertainty for extremes. This paper evaluates three deterministic AIWP models (FuXi, GraphCast, SFNO) under initial-condition perturbations to form $50$-member ensembles for the 2022 Pakistan floods and the China heatwave, benchmarked against ERA5 and ENS/AIFSENS. Flow-dependent perturbations, especially Huge Ensembles ($HENS$), improve ensemble realism and probabilistic skill (ROCSS, CRPS) relative to Gaussian perturbations, narrowing the gap with NWP but not closing it. Temperature extremes are more reliably captured than precipitation, highlighting limits tied to subgrid physics. The findings motivate hybrid strategies that integrate flow-dependent perturbations with latent-space uncertainty modeling to enable more trustworthy AI-driven early warnings.
Abstract
Accurate prediction of extreme weather events remains a major challenge for artificial intelligence based weather prediction systems. While deterministic models such as FuXi, GraphCast, and SFNO have achieved competitive forecast skill relative to numerical weather prediction, their ability to represent uncertainty and capture extremes is still limited. This study investigates how state of the art deterministic artificial intelligence based models respond to initial-condition perturbations and evaluates the resulting ensembles in forecasting extremes. Using three perturbation strategies (Gaussian noise, Hemispheric Centered Bred Vectors, and Huge Ensembles), we generate 50 member ensembles for two major events in August 2022: the Pakistan floods and the China heatwave. Ensemble skill is assessed against ERA5 and compared with IFS ENS and the probabilistic AIFSENS model using deterministic and probabilistic metrics. Results show that flow dependent perturbations produce the most realistic ensemble spread and highest probabilistic skill, narrowing but not closing the performance gap with numerical weather prediction ensembles. Across variables, artificial intelligence based weather models capture temperature extremes more effectively than precipitation. These findings demonstrate that input perturbations can extend deterministic models toward probabilistic forecasting, paving the way for approaches that combine flow dependent perturbations with generative or latent-space uncertainty modeling for reliable artificial intelligence-driven early warning systems.
