Table of Contents
Fetching ...

Modeling Spatio-temporal Extremes via Conditional Variational Autoencoders

Xiaoyu Ma, Likun Zhang, Christopher K. Wikle

TL;DR

This work develops a conditional XVAE (cXVAE) to model nonstationary spatio‑temporal extremes by integrating climate covariates into a max‑id based extreme framework. The method combines a learnable basis function representation with a CNN decoder to jointly capture extremal dependence and spatial structure, while enabling counterfactual analysis by intervening on climate drivers. Through simulations and real‑world Fire Weather Index data conditioned on ENSO, the approach demonstrates accurate replication of extremal dependence (χ), spatial extent (ARE), and tail behaviors (twCRPS, Q‑Q), with feasible computation and clear pathways for scenario analysis. The framework provides a practical tool for risk assessment and climate impact studies, offering interpretable outputs, conditional emulations, and actionable counterfactuals under changing climate conditions.

Abstract

Extreme weather events are widely studied in fields such as agriculture, ecology, and meteorology. The spatio-temporal co-occurrence of extreme events can strengthen or weaken under changing climate conditions. In this paper, we propose a novel approach to model spatio-temporal extremes by integrating climate indices via a conditional variational autoencoder (cXVAE). A convolutional neural network (CNN) is embedded in the decoder to convolve climatological indices with the spatial dependence within the latent space, thereby allowing the decoder to be dependent on the climate variables. There are three main contributions here. First, we demonstrate through extensive simulations that the proposed conditional XVAE accurately emulates spatial fields and recovers spatially and temporally varying extremal dependence with very low computational cost post training. Second, we provide a simple, scalable approach to detecting condition-driven shifts and whether the dependence structure is invariant to the conditioning variable. Third, when dependence is found to be condition-sensitive, the conditional XVAE supports counterfactual experiments allowing intervention on the climate covariate and propagating the associated change through the learned decoder to quantify differences in joint tail risk, co-occurrence ranges, and return metrics. To demonstrate the practical utility and performance of the model in real-world scenarios, we apply our method to analyze the monthly maximum Fire Weather Index (FWI) over eastern Australia from 2014 to 2024 conditioned on the El Niño/Southern Oscillation (ENSO) index.

Modeling Spatio-temporal Extremes via Conditional Variational Autoencoders

TL;DR

This work develops a conditional XVAE (cXVAE) to model nonstationary spatio‑temporal extremes by integrating climate covariates into a max‑id based extreme framework. The method combines a learnable basis function representation with a CNN decoder to jointly capture extremal dependence and spatial structure, while enabling counterfactual analysis by intervening on climate drivers. Through simulations and real‑world Fire Weather Index data conditioned on ENSO, the approach demonstrates accurate replication of extremal dependence (χ), spatial extent (ARE), and tail behaviors (twCRPS, Q‑Q), with feasible computation and clear pathways for scenario analysis. The framework provides a practical tool for risk assessment and climate impact studies, offering interpretable outputs, conditional emulations, and actionable counterfactuals under changing climate conditions.

Abstract

Extreme weather events are widely studied in fields such as agriculture, ecology, and meteorology. The spatio-temporal co-occurrence of extreme events can strengthen or weaken under changing climate conditions. In this paper, we propose a novel approach to model spatio-temporal extremes by integrating climate indices via a conditional variational autoencoder (cXVAE). A convolutional neural network (CNN) is embedded in the decoder to convolve climatological indices with the spatial dependence within the latent space, thereby allowing the decoder to be dependent on the climate variables. There are three main contributions here. First, we demonstrate through extensive simulations that the proposed conditional XVAE accurately emulates spatial fields and recovers spatially and temporally varying extremal dependence with very low computational cost post training. Second, we provide a simple, scalable approach to detecting condition-driven shifts and whether the dependence structure is invariant to the conditioning variable. Third, when dependence is found to be condition-sensitive, the conditional XVAE supports counterfactual experiments allowing intervention on the climate covariate and propagating the associated change through the learned decoder to quantify differences in joint tail risk, co-occurrence ranges, and return metrics. To demonstrate the practical utility and performance of the model in real-world scenarios, we apply our method to analyze the monthly maximum Fire Weather Index (FWI) over eastern Australia from 2014 to 2024 conditioned on the El Niño/Southern Oscillation (ENSO) index.

Paper Structure

This paper contains 33 sections, 2 theorems, 90 equations, 15 figures.

Key Result

Theorem 3.1

Let $\{Y(\boldsymbol{s}):\boldsymbol{s}\in\mathcal{S}\}$ be a nonnegative random field that satisfies for each $\boldsymbol{s}$, $\mathbb{E}\{Y(\boldsymbol{s})^{\alpha_0+\eta}\}<\infty$ for some $\eta>0$, and for all pairs $(\boldsymbol{s}_1,\boldsymbol{s}_2)$, $\mathbb{E}\{Y(\boldsymbol{s}_1)^{\alp where $\{\epsilon_F(\boldsymbol{s})\}$ and $\{\epsilon_L(\boldsymbol{s})\}$ are i.i.d. across $\bol

Figures (15)

  • Figure 1: Conditional XVAE architecture with three main components: an encoder, a latent space, and a CNN decoder. Encoder (top left): For each time $t=1,\ldots,n_t$, the input spatial fields $\boldsymbol{X}_t$ of $n_s$ locations is mapped through a dense layer with Softplus activation to produce mean $\boldsymbol{\mu}_t$ and log-variance $\log \boldsymbol{\sigma}_t$ vectors of dimension $K$. Latent Space (top right): Latent variables are constructed on the log scale with transformed conditions $g(\boldsymbol{c}_t)$ and fused with $\boldsymbol{c}_t$ as in \ref{['eqn:cXVAE_fuse']}. CNN decoder (red box): The fused latent variables are stacked and transposed to form structured inputs. Convolution and max pooling layers extract feature maps (e.g., from $3 \times 2K$ to $40 \times 2K$), which are flattened and passed through a dense layer to yield coefficients $\{ \xi_{1t}, \xi_{2t}, \ldots, \xi_{Mt}\}$, as defined in \ref{['eqn:CNNdecoder']}. The generative process is summarized in the bottom right. The tilting parameters $\boldsymbol{\theta}_t$ that control extremal dependence are estimated via pre-specified basis functions $\varphi_{mt}$. The de-noised response $\boldsymbol{Y}_t$ is obtained as a linear combination of latent variables $\boldsymbol{Z}_t$ and learnable weights $\boldsymbol{W}$, and the response surface $\boldsymbol{X}_t$ is generated by introducing log-Laplace noise $\boldsymbol{\epsilon}_t$.
  • Figure 2: (a): The smoothed ENSO time series $c_t$ is shown as black dots after applying a 5-month moving average to the raw ENSO time series. (b): Simulated $\boldsymbol{\theta}_t(c_t)$ when $c_t= 0.859$ in December 1982 (first red dash line). (c): Simulated $\boldsymbol{\theta}_t(c_t)$ when $c_t= 0.482$ in August 1986 (second red dash line). (d): Simulated $\boldsymbol{\theta}_t(c_t)$ when $c_t= 0.118$ in December 1998 (third red dash line).
  • Figure 3: First row: ENSO indexes and counterfactual ENSO indexes (flipped) from October 1996 to March 1999 (the time window marked in the shades of Figure \ref{['fig:ENSO']}). Second row: True $\boldsymbol{\theta}_t$ at 3 selected times. Third row: Estimated $\boldsymbol{\theta}_t$ at 3 selected times. Fourth row: True $\log(\boldsymbol{X}_t)$ at 3 selected times. Fifth row: Emulated $\log(\boldsymbol{X}_t)$ at 3 selected times.
  • Figure 4: Kernel density contour plots of emulated samples at two selected spatial locations under original and counterfactual ENSO conditions. Each panel corresponds to a different time: December 1997 (left), June 1998 (middle), and February 1999 (right). The counterfactual ENSO signal induces clear differences in the distributions, particularly in December 1997 and February 1999, where clear deviations between the counterfactual (red) and emulation (blue) contours are observed.
  • Figure 5: $\chi$-coefficients for short (distance 0.5), medium (distance 3) and long (distance 6) spatial lags. The emulated curves (blue) closely match the true data (red), capturing both strong short-range and weak long-range extremal dependence.
  • ...and 10 more figures

Theorems & Definitions (4)

  • Theorem 3.1: Tail equivalence under noise replacement
  • Remark 1
  • Lemma B.1: Potter bounds for regularly varying tails
  • proof : Proof of Theorem \ref{['thm:tail_equiv']}