Stochastic Latent Transformer: Efficient Modelling of Stochastically Forced Zonal Jets

Ira J. S. Shokar; Rich R. Kerswell; Peter H. Haynes

Stochastic Latent Transformer: Efficient Modelling of Stochastically Forced Zonal Jets

Ira J. S. Shokar, Rich R. Kerswell, Peter H. Haynes

TL;DR

The paper addresses efficient probabilistic modelling of stochastically forced zonal jets governed by SPDEs, focusing on the mean-flow field U(y,t) and its interaction with eddies. It introduces the Stochastic Latent Transformer (SLT), which combines a translation-equivariant autoencoder (TEPC) for a phase-aligned latent representation Z with a stochastic transformer that evolves Z via Z_{t+1} = \mathcal{T}_\varphi[Z_{t:t-L}, \epsilon], trained using CRPS and a spectral loss. The approach yields faithful short- and long-term statistics, matching spectral properties and transition-event distributions while delivering over five orders of magnitude speedup, enabling large ensembles for uncertainty quantification of spontaneous jet-transition events. This scalable framework invites extensions to higher-dimensional geophysical flows and transfer learning across regimes, facilitating robust, data-driven exploration of stochastic turbulence systems.

Abstract

We present a novel probabilistic deep learning approach, the 'Stochastic Latent Transformer' (SLT), designed for the efficient reduced-order modelling of stochastic partial differential equations. Stochastically driven flow models are pertinent to a diverse range of natural phenomena, including jets on giant planets, ocean circulation, and the variability of midlatitude weather. However, much of the recent progress in deep learning has predominantly focused on deterministic systems. The SLT comprises a stochastically-forced transformer paired with a translation-equivariant autoencoder, trained towards the Continuous Ranked Probability Score. We showcase its effectiveness by applying it to a well-researched zonal jet system, where the interaction between stochastically forced eddies and the zonal mean flow results in a rich low-frequency variability. The SLT accurately reproduces system dynamics across various integration periods, validated through quantitative diagnostics that include spectral properties and the rate of transitions between distinct states. The SLT achieves a five-order-of-magnitude speedup in emulating the zonally-averaged flow compared to direct numerical simulations. This acceleration facilitates the cost-effective generation of large ensembles, enabling the exploration of statistical questions concerning the probabilities of spontaneous transition events.

Stochastic Latent Transformer: Efficient Modelling of Stochastically Forced Zonal Jets

TL;DR

Abstract

Paper Structure (14 sections, 15 equations, 11 figures)

This paper contains 14 sections, 15 equations, 11 figures.

Introduction
Methods
Autoencoder - Translation Equivariant Pointwise Convolution
Stochastic Latent Transformer
Training
Model Hyperparameters
Results
Short Term Evaluation
Long Time Evolutions
Characterising Transition Events
Conclusion and Future Work
Numerical simulations of geostrophic turbulence
Comparisons With Other Architectures
Supplementary Figures

Figures (11)

Figure 1: Latitude-time plots of $\mathbf{U(y,t)}$ displaying ensemble of numerical integrations and neural network emulations.a-h exhibit numerical integrations with identical initial conditions up to $t=10$ (indicated by the dotted line) and distinct realisations of random noise, $\xi$, after $t=10$, spanning a forecast period of 500 time units. i-p showcase neural network emulations with identical initial conditions as (a-h), but with different noise histories $\epsilon \sim \mathcal{N}(0,1)$ after $t=10$. Numerical integration employs a time step of $\delta t=4\times10^{-4}$, with a-h displaying coarsened time intervals, with the neural network operating at unit time intervals $\Delta t=1$. Yellow indicates positive $U(y,t)$, signifying eastward jets. Noteworthy events within these jets encompass nucleation, coalescence, and latitudinal translation. The neural network adeptly captures anticipated system features, offering plausible evolutions.
Figure 2: Schematic of the Stochastic Latent Transformer (SLT) architecture and its components. a Translation Equivariant Pointwise 1D Convolution (TEPC) Layer. The layer convolves inputs $X$ with learned weights $W$ in the frequency domain, with weights $W$ learned irrespective of the phase, $\phi$ ($\phi$ is the argument of the first mode of $\hat{X}$). b Stochastic Latent Transformer (SLT) architecture. Solid arrows indicate the forward pass. The encoder employs two TEPC layers with a nonlinear activation function, $\sigma$, to encode the short time history of $U_{t-L:t}$. The resulting $Z_{t-L:t}$ is fed into the Stochastic Transformer (ST) for forecasting $\tilde{Z}_{t+1}$. The dotted line indicates the autoregressive flow ($\tilde{Z}_{t-L+1:t+1}$) for forecasting subsequent steps up to $\tilde{Z}_{t_{max}}$ during inference. The decoder then transforms $\tilde{Z}_{t+1}$ (or $\tilde{Z}_{t+1:t_{max}}$) back to the physical space $\tilde{U}_{t+1}$ (or $\tilde{U}_{t+1:t_{max}}$), mirroring the encoder's architecture. c Stochastic Transformer (ST) architecture. The weights are translation invariant, as with the TEPC, with the phase, $\phi$, at time $t$ removed. The green block represents the random noise vector, $\epsilon \sim \mathcal{N}(0, 1)$, and is concatenated with the latent space-time histories $Z_{t-L:t}$. The architecture consists of N transformer blocks, where 'MHA' is multi-headed attention using scaled dot-product attention (see Figure \ref{['fig:mha']} for details), 'Linear' is learned linear transformation and $\sigma$ is the nonlinear activation function. Further details are outlined in the Methods section.
Figure 3: Schematic of stochastic multi-headed attention. The initial transformer block among the N blocks incorporates a stochastic variant of multi-headed attention. Here $\epsilon \sim \mathcal{N}(0, 1)$ is introduced as an additional sequence member, $L$ represents the length of the latent time history, and $D_{\mathcal{M}}$ denotes the latent dimension. Weights, W, linearly transform inputs into Q (Query), K (Key), and V (Value) vectors. The $\epsilon$ input is identical for both the K and V vectors. Dotted lines in the A (Attention) matrices separate the additional row introduced from cross-attention, given forcing $\epsilon$, and the self-attention of the latent time histories. This illustrates the case where the batch size = 1 and number of heads = 1, with reshaping implemented to extend this to multiple heads. The non-stochastic self-attention in the following transformer blocks operates in the exact same manner, simply without the concatenation of $\epsilon$.
Figure 4: Performance evaluation over short time scales. a presents the Mean Absolute Error (MAE) for short-term tracking ability. Green represents an ensemble of 7 numerical integrations, while red shows 7 emulations from the SLT with respect to the reference truth trajectory (Figure \ref{['fig:num_int_and_ml_emul']}.a). Blue indicates an ensemble produced by a Variational Autoencoder (VAE). b showcases ensemble variation. Using these evaluation metrics, we observe that the SLT exhibits strong agreement with the numerical integration.
Figure 5: Assessing model faithfulness over long-term evolutions.a presents a long-term evolution obtained through numerical integration, spanning 20,000 observable time units (from a longer evolution of $t_{\text{max}}=10^\text{6}$). b displays a long-term evolution generated by the SLT, showing qualitative similarity to a with the occurrence of nucleation and coalescence events, as well as latitudinal translation. c illustrates a long-term evolution produced by a Temporal Variational Autoencoder (VAE) that becomes unstable. d depicts a long-term evolution spanning 5,000 time increments generated by a VAE with adversarial training, which suffers from mode collapse, resulting in cyclically repeating features. These emulations demonstrate the robustness of the autoregressive SLT in capturing and maintaining realistic long-term dynamics, which could not be replicated by existing architectures.
...and 6 more figures

Stochastic Latent Transformer: Efficient Modelling of Stochastically Forced Zonal Jets

TL;DR

Abstract

Stochastic Latent Transformer: Efficient Modelling of Stochastically Forced Zonal Jets

Authors

TL;DR

Abstract

Table of Contents

Figures (11)