Table of Contents
Fetching ...

Deep Learning Atmospheric Models Reliably Simulate Out-of-Sample Land Heat and Cold Wave Frequencies

Zilu Meng, Gregory J. Hakim, Wenchang Yang, Gabriel A. Vecchi

TL;DR

The paper evaluates two deep-learning atmospheric models (DLESyM and NGCM) against a conventional physical GCM (HiRAM) for simulating land heatwaves and coldwaves under AMIP forcing from 1900–2020, with a focus on out-of-sample 1900–1960. It shows that both DL-based models can generalize to unseen climate conditions with skill comparable to HiRAM, though their performance is regionally variable and influenced by the degree of temporal autocorrelation in surface temperatures; DLESyM overestimates extremes due to high autocorrelation, while NGCM aligns more closely with HiRAM. A simple linear baseline and multiple verification datasets (20CRv3, ERA5, BE, HadISST) help attribute discrepancies to forcing and data limitations. The findings highlight that model architecture—especially how physical constraints shape persistence—matters for extreme-event frequency estimates and point to DL-based GCMs as fast, scalable complements to traditional climate models, offering large ensembles for robust uncertainty quantification.

Abstract

Deep learning (DL)-based general circulation models (GCMs) are emerging as fast simulators, yet their ability to replicate extreme events outside their training range remains unknown. Here, we evaluate two such models -- the hybrid Neural General Circulation Model (NGCM) and purely data-driven Deep Learning Earth System Model (DL\textit{ESy}M) -- against a conventional high-resolution land-atmosphere model (HiRAM) in simulating land heatwaves and coldwaves. All models are forced with observed sea surface temperatures and sea ice over 1900-2020, focusing on the out-of-sample early-20th-century period (1900-1960). Both DL models generalize successfully to unseen climate conditions, broadly reproducing the frequency and spatial patterns of heatwave and cold wave events during 1900-1960 with skill comparable to HiRAM. An exception is over portions of North Asia and North America, where all models perform poorly during 1940-1960. Due to excessive temperature autocorrelation, DL\textit{ESy}M tends to overestimate heatwave and cold wave frequencies, whereas the physics-DL hybrid NGCM exhibits persistence more similar to HiRAM.

Deep Learning Atmospheric Models Reliably Simulate Out-of-Sample Land Heat and Cold Wave Frequencies

TL;DR

The paper evaluates two deep-learning atmospheric models (DLESyM and NGCM) against a conventional physical GCM (HiRAM) for simulating land heatwaves and coldwaves under AMIP forcing from 1900–2020, with a focus on out-of-sample 1900–1960. It shows that both DL-based models can generalize to unseen climate conditions with skill comparable to HiRAM, though their performance is regionally variable and influenced by the degree of temporal autocorrelation in surface temperatures; DLESyM overestimates extremes due to high autocorrelation, while NGCM aligns more closely with HiRAM. A simple linear baseline and multiple verification datasets (20CRv3, ERA5, BE, HadISST) help attribute discrepancies to forcing and data limitations. The findings highlight that model architecture—especially how physical constraints shape persistence—matters for extreme-event frequency estimates and point to DL-based GCMs as fast, scalable complements to traditional climate models, offering large ensembles for robust uncertainty quantification.

Abstract

Deep learning (DL)-based general circulation models (GCMs) are emerging as fast simulators, yet their ability to replicate extreme events outside their training range remains unknown. Here, we evaluate two such models -- the hybrid Neural General Circulation Model (NGCM) and purely data-driven Deep Learning Earth System Model (DL\textit{ESy}M) -- against a conventional high-resolution land-atmosphere model (HiRAM) in simulating land heatwaves and coldwaves. All models are forced with observed sea surface temperatures and sea ice over 1900-2020, focusing on the out-of-sample early-20th-century period (1900-1960). Both DL models generalize successfully to unseen climate conditions, broadly reproducing the frequency and spatial patterns of heatwave and cold wave events during 1900-1960 with skill comparable to HiRAM. An exception is over portions of North Asia and North America, where all models perform poorly during 1940-1960. Due to excessive temperature autocorrelation, DL\textit{ESy}M tends to overestimate heatwave and cold wave frequencies, whereas the physics-DL hybrid NGCM exhibits persistence more similar to HiRAM.

Paper Structure

This paper contains 12 sections, 4 equations, 17 figures, 1 table.

Figures (17)

  • Figure 1: Heatwave Frequency. (a--d) Time-average heatwave frequency from 1900 to 1960 in NGCM, DLESyM, HiRAM, and 20CRv3. (e--g) Correlation of annual mean heatwave frequency with 20CRv3. (h--i) Correlation of annual mean heatwave frequency in HiRAM with DLESyM and NGCM. (j) Global mean of annual mean heatwave frequency as a function of time from 1900 to 1960. (k) Correlation matrix of global-mean annual heatwave frequency between each model, 20CRv3 and 20CR-BE.
  • Figure 2: Coldwave Frequency. As in Figure \ref{['fig:heatwave']}, but for coldwaves.
  • Figure 3: Autocorrelation of Daily Mean Temperature (1900--1960). (a--c) Differences in 1-day lag autocorrelation: (a) NGCM minus 20CRv3, (b) DLESyM minus 20CRv3, and (c) HiRAM minus 20CRv3. (d) 1-day lag autocorrelation for 20CRv3. (e) Global-mean land temperature autocorrelation as a function of lag time for NGCM (blue, solid), DLESyM (yellow, dashed), HiRAM (red, dotted), and 20CRv3 (gray, dotted).
  • Figure 4: Annual Temperature over America (Left: 125°W--71°W, 20°N--58°N) and North Asia (Right: 70°E--160°E, 40°N--73°N). (a–b) Annual ensemble mean temperature over America (a) and North Asia (b). (c–f) Correlation matrices between all models, 20CR, Berkeley Earth (BE), and ERA20C during 1900--1960 (c, e) and 1960--2010 (d, f). (g–h) 40-year rolling correlations with Berkeley Earth. The spatial extent of the two regions is shown in Figure \ref{['sfig:region']}.
  • Figure 5: Distribution of temperature anomalies from 1900--1960. Log-scaled temperature probability density distributions for DJF (left) and JJA (right) at Helsinki, Paris, Washington DC, Cairo, Mexico City, and Sydney. Anomalies are computed relative to the 1980--2010 reference period. Black denotes Berkeley Earth (BE), red denotes NGCM, orange denotes DLESyM, purple denotes 20CR, and green denotes HiRAM; lines are distributed above and below the abscissa for clarity. Shading represents the distribution, with the thin vertical solid line indicating the median (50th percentile) and dashed lines indicating the 1st and 99th percentiles.
  • ...and 12 more figures