Structured linear factor models for tail dependence
Alexis Boulin, Axel Bücher
TL;DR
This work develops a structured, tail-dependent framework for high-dimensional extreme-value modeling via max-linear factor structures, parameterized by a loading matrix $\bar{A}$ and latent factors. By imposing a pure variable assumption, the authors establish identifiability of both the number of factors $K$ and the loading matrix up to permutation, and propose two algorithms, PureVar and HTSP, with non-asymptotic and high-dimensional guarantees, including $d$ exceeding $n$. They provide a detailed Monte Carlo study and two case studies—dietary intakes and wind gusts—to illustrate strengths and limitations, and discuss potential extensions such as a two-stage factor model to relax the pure-variable constraint. Overall, the paper advances robust, scalable estimation of extremal dependence in high dimensions, offering practical tools for extreme-value analysis in fields like finance, environmental science, and health. The methods hinge on the stable tail dependence function $L$ and its relation to spectral measures, enabling reliable inference of extremal directions and factor structure from tail data.
Abstract
A common object to describe the extremal dependence of a $d$-variate random vector $X$ is the stable tail dependence function $L$. Various parametric models have emerged, with a popular subclass consisting of those stable tail dependence functions that arise for linear and max-linear factor models with heavy tailed factors. The stable tail dependence function is then parameterized by a $d \times K$ matrix $A$, where $K$ is the number of factors and where $A$ can be interpreted as a factor loading matrix. We study estimation of $L$ under an additional assumption on $A$ called the `pure variable assumption'. Both $K \in \{1, \dots, d\}$ and $A \in [0, \infty)^{d \times K}$ are treated as unknown, which constitutes an unconventional parameter space that does not fit into common estimation frameworks. We suggest two algorithms that allow to estimate $K$ and $A$, and provide finite sample guarantees for both algorithms. Remarkably, the guarantees allow for the case where the dimension $d$ is larger than the sample size $n$. The results are illustrated with numerical experiments and two case studies.
