Table of Contents
Fetching ...

Structured linear factor models for tail dependence

Alexis Boulin, Axel Bücher

TL;DR

This work develops a structured, tail-dependent framework for high-dimensional extreme-value modeling via max-linear factor structures, parameterized by a loading matrix $\bar{A}$ and latent factors. By imposing a pure variable assumption, the authors establish identifiability of both the number of factors $K$ and the loading matrix up to permutation, and propose two algorithms, PureVar and HTSP, with non-asymptotic and high-dimensional guarantees, including $d$ exceeding $n$. They provide a detailed Monte Carlo study and two case studies—dietary intakes and wind gusts—to illustrate strengths and limitations, and discuss potential extensions such as a two-stage factor model to relax the pure-variable constraint. Overall, the paper advances robust, scalable estimation of extremal dependence in high dimensions, offering practical tools for extreme-value analysis in fields like finance, environmental science, and health. The methods hinge on the stable tail dependence function $L$ and its relation to spectral measures, enabling reliable inference of extremal directions and factor structure from tail data.

Abstract

A common object to describe the extremal dependence of a $d$-variate random vector $X$ is the stable tail dependence function $L$. Various parametric models have emerged, with a popular subclass consisting of those stable tail dependence functions that arise for linear and max-linear factor models with heavy tailed factors. The stable tail dependence function is then parameterized by a $d \times K$ matrix $A$, where $K$ is the number of factors and where $A$ can be interpreted as a factor loading matrix. We study estimation of $L$ under an additional assumption on $A$ called the `pure variable assumption'. Both $K \in \{1, \dots, d\}$ and $A \in [0, \infty)^{d \times K}$ are treated as unknown, which constitutes an unconventional parameter space that does not fit into common estimation frameworks. We suggest two algorithms that allow to estimate $K$ and $A$, and provide finite sample guarantees for both algorithms. Remarkably, the guarantees allow for the case where the dimension $d$ is larger than the sample size $n$. The results are illustrated with numerical experiments and two case studies.

Structured linear factor models for tail dependence

TL;DR

This work develops a structured, tail-dependent framework for high-dimensional extreme-value modeling via max-linear factor structures, parameterized by a loading matrix and latent factors. By imposing a pure variable assumption, the authors establish identifiability of both the number of factors and the loading matrix up to permutation, and propose two algorithms, PureVar and HTSP, with non-asymptotic and high-dimensional guarantees, including exceeding . They provide a detailed Monte Carlo study and two case studies—dietary intakes and wind gusts—to illustrate strengths and limitations, and discuss potential extensions such as a two-stage factor model to relax the pure-variable constraint. Overall, the paper advances robust, scalable estimation of extremal dependence in high dimensions, offering practical tools for extreme-value analysis in fields like finance, environmental science, and health. The methods hinge on the stable tail dependence function and its relation to spectral measures, enabling reliable inference of extremal directions and factor structure from tail data.

Abstract

A common object to describe the extremal dependence of a -variate random vector is the stable tail dependence function . Various parametric models have emerged, with a popular subclass consisting of those stable tail dependence functions that arise for linear and max-linear factor models with heavy tailed factors. The stable tail dependence function is then parameterized by a matrix , where is the number of factors and where can be interpreted as a factor loading matrix. We study estimation of under an additional assumption on called the `pure variable assumption'. Both and are treated as unknown, which constitutes an unconventional parameter space that does not fit into common estimation frameworks. We suggest two algorithms that allow to estimate and , and provide finite sample guarantees for both algorithms. Remarkably, the guarantees allow for the case where the dimension is larger than the sample size . The results are illustrated with numerical experiments and two case studies.

Paper Structure

This paper contains 23 sections, 13 theorems, 148 equations, 7 figures, 2 algorithms.

Key Result

Lemma 2.1

Suppose $\bm X$ is a $d$-variate random vector with continuous marginal cdfs $F_1, \dots, F_d$. Then, the stable dependence function $L$ of $\bm X$ exists if and only if the random vector $\bm Y=(Y_1, \dots, Y_d)^\top$ defined by $Y_j=1/(1-F_j(X_j))$ is regularly varying. In that case:

Figures (7)

  • Figure 1: Performance metrics for the linear model with noise across different parameter combinations. Each rows depicts in order: (1) recovery rate of latent factors, (2) recovery rate of sparsity, (3) recovery rate of pure variables, (4) TFNP, (5) TFPP, and (6) matrix estimation Error. Each metric is plotted as a function of sample size $n$, comparing dimensions $d \in \{100, 1000\}$, and the three choices of $(k,\kappa, \bar{\kappa})$ described in the main text. Results are stratified by $K=5$ (left column) and $K=20$ (right column).
  • Figure 2: Results for the dietary data set with $d=6$. Left: empirical correlations from $\hat{\mathcal{X}}$ vs. fitted extremal correlations from $\tilde{\mathcal{X}}^{\kappa^*, \bar{\kappa}^*}$. Right: Estimated loading matrix $\stackon[0.5pt]{A}{\hbox{$\bm{\triangle}$}}{}^{\kappa^*, \bar{\kappa}^*}$.
  • Figure 3: Results for the wind speed data set. Left: map of $d=22$ weather stations in Schleswig-Holstein. Right: empirical correlations from $\hat{\mathcal{X}}$ vs. fitted extremal correlations from $\tilde{\mathcal{X}}^{\kappa^*, \bar{\kappa}^*}$.
  • Figure 4: Performance metrics for the linear model without noise across different parameter combinations. Each rows depicts in order: (1) recovery rate of latent factors, (2) recovery rate of sparsity, (3) recovery rate of pure variables, (4) TFNP, (5) TFPP, and (6) matrix estimation Error. Each metric is plotted as a function of sample size $n$, comparing dimensions $d \in \{100, 1000\}$, and the three choices of $(k,\kappa, \bar{\kappa})$ described in the main text. Results are stratified by $K=5$ (left column) and $K=20$ (right column).
  • Figure 5: Same as Figure \ref{['fig:result_sum_no_noise']}, but for the max-linear model without noise.
  • ...and 2 more figures

Theorems & Definitions (28)

  • Lemma 2.1
  • Remark 2.4
  • Proposition 2.6
  • Lemma 2.7
  • Proposition 2.9: Identifiability
  • Proposition 2.10
  • Remark 3.1
  • Theorem 3.2: Statistical guarantees for the PureVar-Algorithm
  • Remark 3.3
  • Theorem 3.4: Statistical guarantees for the HTSP-Algorithm
  • ...and 18 more