Table of Contents
Fetching ...

Learning extremal graphical structures in high dimensions

Sebastian Engelke, Michaël Lalancette, Stanislav Volgushev

TL;DR

This work develops a principled framework for learning extremal graphical structures in high dimensions. It introduces non-asymptotic concentration bounds for empirical extremal variograms and a general EGlearn majority-voting algorithm to recover arbitrary extremal graphs under Hüsler–Reiss models, with consistency guarantees under explicit conditions. The methodology is validated through simulations on Barabási–Albert and block-graph models and applied to hydrological and financial data, yielding interpretable tail-dependence graphs that align with domain knowledge. The results enable scalable, high-dimensional inference for tail dependencies and offer avenues for uncertainty quantification and extension to broader variogram estimators and base learners.

Abstract

Extremal graphical models encode the conditional independence structure of multivariate extremes. Key statistics for learning extremal graphical structures are empirical extremal variograms, for which we prove non-asymptotic concentration bounds that hold under general domain of attraction conditions. For the popular class of Hüsler--Reiss models, we propose a majority voting algorithm for learning the underlying graph from data through $L^1$ regularized optimization. Our concentration bounds are used to derive explicit conditions that ensure the consistent recovery of any connected graph. The methodology is illustrated through a simulation study as well as the analysis of river discharge and currency exchange data.

Learning extremal graphical structures in high dimensions

TL;DR

This work develops a principled framework for learning extremal graphical structures in high dimensions. It introduces non-asymptotic concentration bounds for empirical extremal variograms and a general EGlearn majority-voting algorithm to recover arbitrary extremal graphs under Hüsler–Reiss models, with consistency guarantees under explicit conditions. The methodology is validated through simulations on Barabási–Albert and block-graph models and applied to hydrological and financial data, yielding interpretable tail-dependence graphs that align with domain knowledge. The results enable scalable, high-dimensional inference for tail dependencies and offer avenues for uncertainty quantification and extension to broader variogram estimators and base learners.

Abstract

Extremal graphical models encode the conditional independence structure of multivariate extremes. Key statistics for learning extremal graphical structures are empirical extremal variograms, for which we prove non-asymptotic concentration bounds that hold under general domain of attraction conditions. For the popular class of Hüsler--Reiss models, we propose a majority voting algorithm for learning the underlying graph from data through regularized optimization. Our concentration bounds are used to derive explicit conditions that ensure the consistent recovery of any connected graph. The methodology is illustrated through a simulation study as well as the analysis of river discharge and currency exchange data.

Paper Structure

This paper contains 60 sections, 22 theorems, 351 equations, 14 figures, 1 table.

Key Result

Theorem 1

Let assum:tail hold and $\zeta \in (0, 1)$ be arbitrary. There exist positive constants $C$, $c$ and $M$ only depending on $K$, $\xi$ and $\zeta$ such that for any $n^\zeta \leq k \leq n/2$ and $\lambda \leq \sqrt{k}/(\log n)^4$, If in addition assum:r holds, there exists a positive constant $\bar{C}$ only depending on $K$, $\xi$, $\zeta$, $\varepsilon$ and $K(\beta)$ such that for any $k$ and $\

Figures (14)

  • Figure 1: Four graph structures on the node set $V=\{1,\dots, 4\}$. From left to right: tree graph, block graph, decomposable graph, non-decomposable graph.
  • Figure 2: Illustration of the majority voting algorithm when the true underlying graph is the non-decomposable graph on the right-hand side of Figure \ref{['fig:intro']}. Left to right: graphical representation of the estimated matrices $\widetilde{Z}^{(m)}$, where the gray node $Y_m$ is not considered in the $m$th step, $m=1,\dots, 4$; black and red edges indicate correctly and incorrectly estimated edges, respectively.
  • Figure 3: Boxplots of 100 repetitions of the $F$-scores of different methods fitted to data from the model $\mathrm{BA}(d, q)$ of degree $q=1$ (left) and $q=2$ (right) and in dimension $d = 20$.
  • Figure 4: Boxplots of 100 repetitions of the $F$-scores of different methods fitted to data from the model $\mathrm{BA}(d, q)$ of degree $q=1$ (left) and $q=2$ (right) and in dimension $d = 100$.
  • Figure 5: The physical flow connection tree corresponding to the full Danube data (left) and to the data without stations 23--27 (middle), and the selected graph by MBIC estimated by EGlearn (right).
  • ...and 9 more figures

Theorems & Definitions (50)

  • Example 1: Extremal logistic model
  • Example 2: Hüsler--Reiss model
  • Example 3: Extremal logistic model, continued
  • Example 4: Hüsler--Reiss model, continued
  • Example 5: Extremal tree models
  • Remark 1: On the role of $k$
  • Remark 2
  • Theorem 1
  • Remark 3
  • Remark 4
  • ...and 40 more