A VAE Approach to Sample Multivariate Extremes

Nicolas Lafon; Philippe Naveau; Ronan Fablet

A VAE Approach to Sample Multivariate Extremes

Nicolas Lafon, Philippe Naveau, Ronan Fablet

TL;DR

This paper details a variational autoencoder (VAE) approach for sampling multivariate heavy-tailed distributions, i.e., distributions likely to have extremes of particularly large intensities, and improves the learning of the dependency structure between extremes.

Abstract

Generating accurate extremes from an observational data set is crucial when seeking to estimate risks associated with the occurrence of future extremes which could be larger than those already observed. Applications range from the occurrence of natural disasters to financial crashes. Generative approaches from the machine learning community do not apply to extreme samples without careful adaptation. Besides, asymptotic results from extreme value theory (EVT) give a theoretical framework to model multivariate extreme events, especially through the notion of multivariate regular variation. Bridging these two fields, this paper details a variational autoencoder (VAE) approach for sampling multivariate heavy-tailed distributions, i.e., distributions likely to have extremes of particularly large intensities. We illustrate the relevance of our approach on a synthetic data set and on a real data set of discharge measurements along the Danube river network. The latter shows the potential of our approach for flood risks' assessment. In addition to outperforming the standard VAE for the tested data sets, we also provide a comparison with a competing EVT-based generative approach. On the tested cases, our approach improves the learning of the dependency structure between extremes.

A VAE Approach to Sample Multivariate Extremes

TL;DR

Abstract

Paper Structure (42 sections, 9 theorems, 80 equations, 11 figures, 2 tables)

This paper contains 42 sections, 9 theorems, 80 equations, 11 figures, 2 tables.

Introduction
Background
VAE framework
Univariate Extremes
Multivariate Extremes
Tail properties of Distributions Sampled by Generative Models
Marginal Tail of a Standard VAE
Angular Measure of ReLu Networks Transformation of Random Vectors
Proposed VAE Architecture
Overall strategy
Intuition Behind the Heavy-tailed Radius Sampling Scheme
Sampling from Heavy-tailed Radius Distributions
Conditional Sampling of the Angle given the Radius
Implementation
Neural network parameterizations
...and 27 more sections

Key Result

Proposition 5

arora2016understandinghuster2021pareto: A neural network $f: \mathbb{R}^n \rightarrow \mathbb{R}$ composed of operations such as ReLUs, leaky ReLUs, linear layers, maxpooling, maxout activation, concatenation or addition, is a piecewise linear operator with a finite number of linear regions. Therefo

Figures (11)

Figure 1: How to sample from observations (blue dots) in extreme regions (black square) to estimate probability of rare events?
Figure 2: VAE scheme to draw a sample $\mathbf{x}^{(i)}$ (green block) from a multivariate regularly varying random vector. Two VAEs are involved (grey areas): one for radius generation (left) and one for angle generation (right). Each VAE relies on its own latent prior distribution (red blocks) and conditional distribution (blue blocks). Arrows indicate causal relations; the arrow between blue blocks shows that the angle is sampled conditionally on the radius.
Figure 3: Log-QQ plot between the upper decile of 10000 radii samples from StdVAE (blue dots), ExtVAE$_r$ (orange dots), UExtVAE$_r$ (green dots) and the upper decile of the test data set of $R_1$. The log values of the true radius, denoted $\log R_1$ is on the x-axis, the log of the estimated radius, denoted $\log \hat{R}_1$, is on the y-axis. The dots should lie close to the blue line
Figure 4: KL divergence between the radius distribution of the benchmarked VAE models and the target heavy-tailed distribution: we display the KL divergence (see Equation \ref{['eq: KL div threshold']}) above quantile $u$ for $P(R_1>u)$ varying from $0$ to $1$. The compared VAEs are the StdVAE (blue curve), the ExtVAE$_r$ (orange curve) and the UExtVAE$_r$ (green curve). To estimate the KL divergences, 10,000 samples from each distribution are drawn, and $u$ is set to the quantile computed from the samples of $R_1$.
Figure 5: Evolution of the tail index $\alpha$ of UExtVAE$_r$ during the training procedure: we report the value of the tail index as a function of the training epochs for training runs from different initial values. The initial values of $\alpha$ are sampled uniformly between $0.5$ to $3$. The true value of $\alpha$ is $1.5$.
...and 6 more figures

Theorems & Definitions (21)

Example 1
Remark 1
Remark 2
Definition 3
Definition 4
Proposition 5
Proposition 6
Corollary 7
Proposition 8
Proposition 11
...and 11 more

A VAE Approach to Sample Multivariate Extremes

TL;DR

Abstract

A VAE Approach to Sample Multivariate Extremes

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (21)