Benchmarking Regional Thermodynamic Trends in an AI emulator, ACE2, and a hybrid model, NeuralGCM

Katharine Rucker; Ian Baxter; Pedram Hassanzadeh; Tiffany A. Shaw; Hamid A. Pahlavan

Benchmarking Regional Thermodynamic Trends in an AI emulator, ACE2, and a hybrid model, NeuralGCM

Katharine Rucker, Ian Baxter, Pedram Hassanzadeh, Tiffany A. Shaw, Hamid A. Pahlavan

TL;DR

The study benchmarks two AI-driven climate emulators, ACE2 and NeuralGCM, against ERA5 and physics-based AMIP ensembles to test their ability to reproduce satellite-era regional thermodynamic trends. ACE2 is fully data-driven, while NeuralGCM blends a dynamical core with neural subgrid parameterizations; both are evaluated on trends across latitude bands, vertical structure, extremes, and drying for the period $1981$–$2014$. The results show AI models can match or exceed physics-based models in several regional signals, notably Arctic Amplification and tropical upper-tropospheric warming, with ACE2 often performing best in vertical extratropical structures. However, neither AI nor physics-based models consistently capture heat-extreme or drying trends in the US Southwest or other arid regions, underscoring the importance of explicit land representation and land-atmosphere coupling for regional climate-task skill and the potential of AI to complement traditional models when augmented with land processes.

Abstract

AI models have emerged as potential complements to physics-based models, but their skill in capturing observed regional climate trends with important societal impacts has not been explored. Here, we benchmark satellite-era regional thermodynamic trends, including extremes, in an AI emulator (ACE2) and a hybrid model (NeuralGCM). We also compare the AI models' skill to physics-based land-atmosphere models. Both AI models show skill in capturing regional temperature trends such as Arctic Amplification. ACE2 outperforms other models in capturing vertical temperature trends in the midlatitudes. However, the AI models do not capture regional trends in heat extremes over the US Southwest. Furthermore, they do not capture drying trends in arid regions, even though they generally perform better than physics-based models. Our results also show that a data-driven AI emulator can perform comparably to, or better than, hybrid and physics-based models in capturing regional thermodynamic trends.

Benchmarking Regional Thermodynamic Trends in an AI emulator, ACE2, and a hybrid model, NeuralGCM

TL;DR

Abstract

Benchmarking Regional Thermodynamic Trends in an AI emulator, ACE2, and a hybrid model, NeuralGCM

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (13)