AtmoRep: A stochastic model of atmosphere dynamics using large scale representation learning

Christian Lessig; Ilaria Luise; Bing Gong; Michael Langguth; Scarlet Stadtler; Martin Schultz

AtmoRep: A stochastic model of atmosphere dynamics using large scale representation learning

Christian Lessig, Ilaria Luise, Bing Gong, Michael Langguth, Scarlet Stadtler, Martin Schultz

TL;DR

AtmoRep introduces a task-independent stochastic atmosphere model built as a large transformer with 4D space-time tokens and a multi-field Multiformer backbone. Trained via a novel self-supervised masking objective on ERA5, it outputs ensemble-based probabilistic forecasts $p_{\theta}( y \vert x , \alpha )$, enabling intrinsic capabilities such as nowcasting, temporal interpolation, and counterfactual analysis without task-specific training. The model demonstrates state-of-the-art performance for short-term forecasting, robust model-correction against higher-resolution inputs, and the first AI-driven counterfactuals in atmospheric science, with extension pathways to downscaling and data assimilation. By leveraging large observational records and scalable architecture, AtmoRep offers a data-driven complement to first-principles models, with potential to democratize access to historical atmospheric dynamics for diverse applications and inquiries.

Abstract

The atmosphere affects humans in a multitude of ways, from loss of life due to adverse weather effects to long-term social and economic impacts on societies. Computer simulations of atmospheric dynamics are, therefore, of great importance for the well-being of our and future generations. Here, we propose AtmoRep, a novel, task-independent stochastic computer model of atmospheric dynamics that can provide skillful results for a wide range of applications. AtmoRep uses large-scale representation learning from artificial intelligence to determine a general description of the highly complex, stochastic dynamics of the atmosphere from the best available estimate of the system's historical trajectory as constrained by observations. This is enabled by a novel self-supervised learning objective and a unique ensemble that samples from the stochastic model with a variability informed by the one in the historical record. The task-independent nature of AtmoRep enables skillful results for a diverse set of applications without specifically training for them and we demonstrate this for nowcasting, temporal interpolation, model correction, and counterfactuals. We also show that AtmoRep can be improved with additional data, for example radar observations, and that it can be extended to tasks such as downscaling. Our work establishes that large-scale neural networks can provide skillful, task-independent models of atmospheric dynamics. With this, they provide a novel means to make the large record of atmospheric observations accessible for applications and for scientific inquiry, complementing existing simulations based on first principles.

AtmoRep: A stochastic model of atmosphere dynamics using large scale representation learning

TL;DR

, enabling intrinsic capabilities such as nowcasting, temporal interpolation, and counterfactual analysis without task-specific training. The model demonstrates state-of-the-art performance for short-term forecasting, robust model-correction against higher-resolution inputs, and the first AI-driven counterfactuals in atmospheric science, with extension pathways to downscaling and data assimilation. By leveraging large observational records and scalable architecture, AtmoRep offers a data-driven complement to first-principles models, with potential to democratize access to historical atmospheric dynamics for diverse applications and inquiries.

Abstract

Paper Structure (51 sections, 26 equations, 21 figures, 5 tables)

This paper contains 51 sections, 26 equations, 21 figures, 5 tables.

Nowcasting
Temporal interpolation
Model correction
Counterfactuals
Methods
Datasets
Model formulation
Ensemble
Training and Loss
Evaluation
Nowcasting
Counterfactuals
Downscaling
Bias corrections
Code availability
...and 36 more sections

Figures (21)

Figure 1: The AtmoRep model provides a numerical representation $p_{\theta}( y \vert x , \alpha)$ of the conditional probability $p(y \vert x , \alpha)$ for atmospheric states $x$,$y$ subject to external conditions $\alpha$, e.g. the time of $x$,$y$ or their location on the globe. It is implemented as a transformer neural network with $3.5$ billion parameters and trained from the ERA5 reanalysis (top left). For training, local space-time neighborhoods are randomly sampled. The neighborhoods are subdivided into smaller patches, called tokens, and the self-supervised learning task is to reconstruct randomly masked or distorted patches (bottom left, gray patches). An ensemble of prediction heads is used to sample from the AtmoRep core model and provide probabilistic predictions for possible states consistent with the un-masked tokens (bottom right). The ensemble spread that is learned during training arises from the intrinsic variability of the data, i.e. that similar atmospheric states $x(t)$ have different associated states $y$, for example with a fixed offset in time (top right).
Figure 2: AtmoRep can be used for a diverse set of applications without task-specific training (shaded areas depict one standard deviation). Nowcasting: Short-term forecasting can be realized by masking tokens at the future-most time step(s) (bottom right inset). Skill is compared to Pangu-Weather and ECWMF's IFS for zonal velocity, temperature and specific humidity. AtmoRep results are shown for a pre-trained model and one with modest fine-tuning for the task. Model correction: AtmoRep is robust for out-of-distribution input. We exploit it for model correction by using output from IFS as input to AtmoRep. Our model faithfully handles the data, preserving the higher frequency content (top left), and shifts the distribution towards the ERA5 one (right). Temporal interpolation: Temporal interpolation is accomplished by masking tokens in the middle of the temporal domain. Performance is compared to linear interpolation. Counterfactuals: Using initial conditions from, e.g., the period $(2017,2022)$ but prescribed as being from $(1979,1984)$ by using the external conditions $\alpha$ allows for the generation of counterfactuals. The plot shows the difference between the original and the counterfactual distributions, as well as the shape of the full distributions.
Figure 3: Results for downscaling from ERA5 to temperature at $2\,\mathrm{m}$ in the COSMO REA6 dataset for a region in central-eastern Europe. For AtmoRep, zonal and meridional velocities as well as temperature were used as input (at model level $137$, approximately $1000 \, \mathrm{hPa}$). The top row shows the RMSE as well as the spectrum compared to the results obtained with the GAN proposed by Stengel et. al stengel_adversarial_2020 and retrained for our setup (see the supplementary material for details). At the bottom we show three examples for downscaled fields (third row) as well as the ERA5 input (top row) and the COSMO REA6 reference (second row). Also shown is the difference between COSMO REA6 and the downscaled field provided by AtmoRep (bottom row).
Figure 4: Left: precipitation forecast for ERA5 (left), AtmoRep fine-tuned (center) and RADKLIM (right) for a 3h forecast in 2019. Right: Comparison between the mean square error (MSE), Equitable Threat Score (ETS), Peirce Skill Score (PSS) and Frequency Bias Indicator (FBI) in ERA5 and the fine-tuned AtmoRep, using RADKLIM data as ground truth obtained averaging yearly predictions from 2019. The bottom right part shows the distribution of hourly accumulated total precipitations for AtmoRep, ERA5 and RADKLIM.
Figure 5: Overview of the AtmoRep neural network architecture. Its core (right, bluish) consists of a stack of encoder-decoder transformers with one per physical field and coupled through cross-attention. UNet-like connections (dark blue) between encoder and decoder are used to facilitate that a multi-resolution representation is learned. The external conditions $\alpha$ are encoded using a linear layer and appended to the network input. The embedding network is a linear layer and it is preceded by a local positional encoding (not shown) based on the sine/cosine one from the original transformer work but extended to the four-dimensional domain considered in AtmoRep. The members of the ensemble of prediction heads consist each of a linear layer. A different random initialization of each linear layer and training with our novel ensemble loss is sufficient to prevent mode collapse. Further details on the network architecture can be found in the supplementary material.
...and 16 more figures

AtmoRep: A stochastic model of atmosphere dynamics using large scale representation learning

TL;DR

Abstract

AtmoRep: A stochastic model of atmosphere dynamics using large scale representation learning

Authors

TL;DR

Abstract

Table of Contents

Figures (21)