Table of Contents
Fetching ...

A VAE Approach to Sample Multivariate Extremes

Nicolas Lafon, Philippe Naveau, Ronan Fablet

TL;DR

This paper details a variational autoencoder (VAE) approach for sampling multivariate heavy-tailed distributions, i.e., distributions likely to have extremes of particularly large intensities, and improves the learning of the dependency structure between extremes.

Abstract

Generating accurate extremes from an observational data set is crucial when seeking to estimate risks associated with the occurrence of future extremes which could be larger than those already observed. Applications range from the occurrence of natural disasters to financial crashes. Generative approaches from the machine learning community do not apply to extreme samples without careful adaptation. Besides, asymptotic results from extreme value theory (EVT) give a theoretical framework to model multivariate extreme events, especially through the notion of multivariate regular variation. Bridging these two fields, this paper details a variational autoencoder (VAE) approach for sampling multivariate heavy-tailed distributions, i.e., distributions likely to have extremes of particularly large intensities. We illustrate the relevance of our approach on a synthetic data set and on a real data set of discharge measurements along the Danube river network. The latter shows the potential of our approach for flood risks' assessment. In addition to outperforming the standard VAE for the tested data sets, we also provide a comparison with a competing EVT-based generative approach. On the tested cases, our approach improves the learning of the dependency structure between extremes.

A VAE Approach to Sample Multivariate Extremes

TL;DR

This paper details a variational autoencoder (VAE) approach for sampling multivariate heavy-tailed distributions, i.e., distributions likely to have extremes of particularly large intensities, and improves the learning of the dependency structure between extremes.

Abstract

Generating accurate extremes from an observational data set is crucial when seeking to estimate risks associated with the occurrence of future extremes which could be larger than those already observed. Applications range from the occurrence of natural disasters to financial crashes. Generative approaches from the machine learning community do not apply to extreme samples without careful adaptation. Besides, asymptotic results from extreme value theory (EVT) give a theoretical framework to model multivariate extreme events, especially through the notion of multivariate regular variation. Bridging these two fields, this paper details a variational autoencoder (VAE) approach for sampling multivariate heavy-tailed distributions, i.e., distributions likely to have extremes of particularly large intensities. We illustrate the relevance of our approach on a synthetic data set and on a real data set of discharge measurements along the Danube river network. The latter shows the potential of our approach for flood risks' assessment. In addition to outperforming the standard VAE for the tested data sets, we also provide a comparison with a competing EVT-based generative approach. On the tested cases, our approach improves the learning of the dependency structure between extremes.
Paper Structure (42 sections, 9 theorems, 80 equations, 11 figures, 2 tables)

This paper contains 42 sections, 9 theorems, 80 equations, 11 figures, 2 tables.

Key Result

Proposition 5

arora2016understandinghuster2021pareto: A neural network $f: \mathbb{R}^n \rightarrow \mathbb{R}$ composed of operations such as ReLUs, leaky ReLUs, linear layers, maxpooling, maxout activation, concatenation or addition, is a piecewise linear operator with a finite number of linear regions. Therefo

Figures (11)

  • Figure 1: How to sample from observations (blue dots) in extreme regions (black square) to estimate probability of rare events?
  • Figure 2: VAE scheme to draw a sample $\mathbf{x}^{(i)}$ (green block) from a multivariate regularly varying random vector. Two VAEs are involved (grey areas): one for radius generation (left) and one for angle generation (right). Each VAE relies on its own latent prior distribution (red blocks) and conditional distribution (blue blocks). Arrows indicate causal relations; the arrow between blue blocks shows that the angle is sampled conditionally on the radius.
  • Figure 3: Log-QQ plot between the upper decile of 10000 radii samples from StdVAE (blue dots), ExtVAE$_r$ (orange dots), UExtVAE$_r$ (green dots) and the upper decile of the test data set of $R_1$. The log values of the true radius, denoted $\log R_1$ is on the x-axis, the log of the estimated radius, denoted $\log \hat{R}_1$, is on the y-axis. The dots should lie close to the blue line
  • Figure 4: KL divergence between the radius distribution of the benchmarked VAE models and the target heavy-tailed distribution: we display the KL divergence (see Equation \ref{['eq: KL div threshold']}) above quantile $u$ for $P(R_1>u)$ varying from $0$ to $1$. The compared VAEs are the StdVAE (blue curve), the ExtVAE$_r$ (orange curve) and the UExtVAE$_r$ (green curve). To estimate the KL divergences, 10,000 samples from each distribution are drawn, and $u$ is set to the quantile computed from the samples of $R_1$.
  • Figure 5: Evolution of the tail index $\alpha$ of UExtVAE$_r$ during the training procedure: we report the value of the tail index as a function of the training epochs for training runs from different initial values. The initial values of $\alpha$ are sampled uniformly between $0.5$ to $3$. The true value of $\alpha$ is $1.5$.
  • ...and 6 more figures

Theorems & Definitions (21)

  • Example 1
  • Remark 1
  • Remark 2
  • Definition 3
  • Definition 4
  • Proposition 5
  • Proposition 6
  • Corollary 7
  • Proposition 8
  • Proposition 11
  • ...and 11 more