Table of Contents
Fetching ...

LUCIE: A Lightweight Uncoupled ClImate Emulator with long-term stability and physical consistency for O(1000)-member ensembles

Haiwen Guan, Troy Arcomano, Ashesh Chattopadhyay, Romit Maulik

TL;DR

LUCIE presents a lightweight, fully data-driven climate emulator that achieves long-term, stable, physically consistent simulations using as little as $2$ years of $6$-hourly ERA5 data. It combines a spherical Fourier neural operator backbone with a hard Euler integration constraint, a spectral regularizer on tendencies, and a novel Aggregated Gradient Method to enable data-efficient training. The model produces $100$ years of $100$-member ensembles that closely match ERA5 climatology and variability, permits estimation of extremes, and runs on a single GPU in about $2.4$ hours, enabling rapid exploration of climate questions. While successful in many aspects, LUCIE shows limitations in some variability modes (e.g., Kelvin waves, NAM/SAM structure) and in extreme-event tails, pointing to future work on 3D humidity, ocean coupling, and nonstationary climate extension.

Abstract

We present a lightweight, easy-to-train, low-resolution, fully data-driven climate emulator, LUCIE, that can be trained on as low as $2$ years of $6$-hourly ERA5 data. Unlike most state-of-the-art AI weather models, LUCIE remains stable and physically consistent for $100$ years of autoregressive simulation with $100$ ensemble members. Long-term mean climatology from LUCIE's simulation of temperature, wind, precipitation, and humidity matches that of ERA5 data, along with the variability. We further demonstrate how well extreme weather events and their return periods can be estimated from a large ensemble of long-term simulations. We further discuss an improved training strategy with a hard-constrained first-order integrator to suppress autoregressive error growth, a novel spectral regularization strategy to better capture fine-scale dynamics, and finally an optimization algorithm that enables data-limited (as low as $2$ years of $6$-hourly data) training of the emulator without losing stability and physical consistency. Finally, we provide a scaling experiment to compare the long-term bias of LUCIE with respect to the number of training samples. Importantly, LUCIE is an easy to use model that can be trained in just $2.4$h on a single A-100 GPU, allowing for multiple experiments that can explore important scientific questions that could be answered with large ensembles of long-term simulations, e.g., the impact of different variables on the simulation, dynamic response to external forcing, and estimation of extreme weather events, amongst others.

LUCIE: A Lightweight Uncoupled ClImate Emulator with long-term stability and physical consistency for O(1000)-member ensembles

TL;DR

LUCIE presents a lightweight, fully data-driven climate emulator that achieves long-term, stable, physically consistent simulations using as little as years of -hourly ERA5 data. It combines a spherical Fourier neural operator backbone with a hard Euler integration constraint, a spectral regularizer on tendencies, and a novel Aggregated Gradient Method to enable data-efficient training. The model produces years of -member ensembles that closely match ERA5 climatology and variability, permits estimation of extremes, and runs on a single GPU in about hours, enabling rapid exploration of climate questions. While successful in many aspects, LUCIE shows limitations in some variability modes (e.g., Kelvin waves, NAM/SAM structure) and in extreme-event tails, pointing to future work on 3D humidity, ocean coupling, and nonstationary climate extension.

Abstract

We present a lightweight, easy-to-train, low-resolution, fully data-driven climate emulator, LUCIE, that can be trained on as low as years of -hourly ERA5 data. Unlike most state-of-the-art AI weather models, LUCIE remains stable and physically consistent for years of autoregressive simulation with ensemble members. Long-term mean climatology from LUCIE's simulation of temperature, wind, precipitation, and humidity matches that of ERA5 data, along with the variability. We further demonstrate how well extreme weather events and their return periods can be estimated from a large ensemble of long-term simulations. We further discuss an improved training strategy with a hard-constrained first-order integrator to suppress autoregressive error growth, a novel spectral regularization strategy to better capture fine-scale dynamics, and finally an optimization algorithm that enables data-limited (as low as years of -hourly data) training of the emulator without losing stability and physical consistency. Finally, we provide a scaling experiment to compare the long-term bias of LUCIE with respect to the number of training samples. Importantly, LUCIE is an easy to use model that can be trained in just h on a single A-100 GPU, allowing for multiple experiments that can explore important scientific questions that could be answered with large ensembles of long-term simulations, e.g., the impact of different variables on the simulation, dynamic response to external forcing, and estimation of extreme weather events, amongst others.
Paper Structure (21 sections, 6 equations, 16 figures, 4 tables, 2 algorithms)

This paper contains 21 sections, 6 equations, 16 figures, 4 tables, 2 algorithms.

Figures (16)

  • Figure 1: A schematic of LUCIE show the training stage of the model (left panel), inference stage for an individual ensemble member (top right), and global annual mean temperature for $T_{{0.95}}$ for the the first 10 years of inference and ERA5 from 2010-2019 for reference (bottom right).
  • Figure 2: Ensemble mean annual climatology bias of LUCIE for selected variables with respect to the 10 year period of ERA5 and SPEEDY from 2000-2010. LUCIE's climatology is averaged over both the 100 years of simulation and 100 ensemble members.
  • Figure 3: Zonal mean climatology for the 100 LUCIE individual ensemble members (thin blue lines), the LUCIE ensemble mean (thick blue lines), SPEEDY (dashed yellow lines), and ERA5 from 2000 - 2010 (thick red lines).
  • Figure 4: $100$ ensembles of $100$-year averaged diurnal range of $T_{0.95}$ compared to $10$-years average of ERA5 and SPEEDY from 2000 - 2010.
  • Figure 5: $100$ ensembles of $100$-year averaged annual temperature range compared to $10$-years average of ERA5 and SPEEDY from 2000 - 2010.
  • ...and 11 more figures