LUCIE: A Lightweight Uncoupled ClImate Emulator with long-term stability and physical consistency for O(1000)-member ensembles
Haiwen Guan, Troy Arcomano, Ashesh Chattopadhyay, Romit Maulik
TL;DR
LUCIE presents a lightweight, fully data-driven climate emulator that achieves long-term, stable, physically consistent simulations using as little as $2$ years of $6$-hourly ERA5 data. It combines a spherical Fourier neural operator backbone with a hard Euler integration constraint, a spectral regularizer on tendencies, and a novel Aggregated Gradient Method to enable data-efficient training. The model produces $100$ years of $100$-member ensembles that closely match ERA5 climatology and variability, permits estimation of extremes, and runs on a single GPU in about $2.4$ hours, enabling rapid exploration of climate questions. While successful in many aspects, LUCIE shows limitations in some variability modes (e.g., Kelvin waves, NAM/SAM structure) and in extreme-event tails, pointing to future work on 3D humidity, ocean coupling, and nonstationary climate extension.
Abstract
We present a lightweight, easy-to-train, low-resolution, fully data-driven climate emulator, LUCIE, that can be trained on as low as $2$ years of $6$-hourly ERA5 data. Unlike most state-of-the-art AI weather models, LUCIE remains stable and physically consistent for $100$ years of autoregressive simulation with $100$ ensemble members. Long-term mean climatology from LUCIE's simulation of temperature, wind, precipitation, and humidity matches that of ERA5 data, along with the variability. We further demonstrate how well extreme weather events and their return periods can be estimated from a large ensemble of long-term simulations. We further discuss an improved training strategy with a hard-constrained first-order integrator to suppress autoregressive error growth, a novel spectral regularization strategy to better capture fine-scale dynamics, and finally an optimization algorithm that enables data-limited (as low as $2$ years of $6$-hourly data) training of the emulator without losing stability and physical consistency. Finally, we provide a scaling experiment to compare the long-term bias of LUCIE with respect to the number of training samples. Importantly, LUCIE is an easy to use model that can be trained in just $2.4$h on a single A-100 GPU, allowing for multiple experiments that can explore important scientific questions that could be answered with large ensembles of long-term simulations, e.g., the impact of different variables on the simulation, dynamic response to external forcing, and estimation of extreme weather events, amongst others.
