Table of Contents
Fetching ...

Extremes of structural causal models

Sebastian Engelke, Nicola Gnecco, Frank Röttger

TL;DR

This work develops a theory of extremes for structural causal models, showing that tail behavior is captured by a multivariate Pareto distribution on an extremal DAG $G_e$, a subgraph of the original causal graph $G$. It introduces directed extremal graphical models and proves Markov properties for the tail limit, connecting extremal SCMs to tail graphical structures. Two structure-learning approaches are proposed—extremal PC and extremal pruning—each leveraging an extremal conditional independence test to recover $G_e$ from data, with consistency results and a river network application. The framework enables causal inference and extrapolation in the distributional tail, revealing when causal links vanish under extremal interventions and providing practical tools for tail-based causal discovery.

Abstract

The behavior of extreme observations is well-understood for time series or spatial data, but little is known if the data generating process is a structural causal model (SCM). We study the behavior of extremes in this model class, both for the observational distribution and under extremal interventions. We show that under suitable regularity conditions on the structure functions, the extremal behavior is described by a multivariate Pareto distribution, which can be represented as a new SCM on an extremal graph. Importantly, the latter is a sub-graph of the graph in the original SCM, which means that causal links can disappear in the tails. We further introduce a directed version of extremal graphical models and show that an extremal SCM satisfies the corresponding Markov properties. Based on a new test of extremal conditional independence, we propose two algorithms for learning the extremal causal structure from data. The first is an extremal version of the PC-algorithm, and the second is a pruning algorithm that removes edges from the original graph to consistently recover the extremal graph. The methods are illustrated on river data with known causal ground truth.

Extremes of structural causal models

TL;DR

This work develops a theory of extremes for structural causal models, showing that tail behavior is captured by a multivariate Pareto distribution on an extremal DAG , a subgraph of the original causal graph . It introduces directed extremal graphical models and proves Markov properties for the tail limit, connecting extremal SCMs to tail graphical structures. Two structure-learning approaches are proposed—extremal PC and extremal pruning—each leveraging an extremal conditional independence test to recover from data, with consistency results and a river network application. The framework enables causal inference and extrapolation in the distributional tail, revealing when causal links vanish under extremal interventions and providing practical tools for tail-based causal discovery.

Abstract

The behavior of extreme observations is well-understood for time series or spatial data, but little is known if the data generating process is a structural causal model (SCM). We study the behavior of extremes in this model class, both for the observational distribution and under extremal interventions. We show that under suitable regularity conditions on the structure functions, the extremal behavior is described by a multivariate Pareto distribution, which can be represented as a new SCM on an extremal graph. Importantly, the latter is a sub-graph of the graph in the original SCM, which means that causal links can disappear in the tails. We further introduce a directed version of extremal graphical models and show that an extremal SCM satisfies the corresponding Markov properties. Based on a new test of extremal conditional independence, we propose two algorithms for learning the extremal causal structure from data. The first is an extremal version of the PC-algorithm, and the second is a pruning algorithm that removes edges from the original graph to consistently recover the extremal graph. The methods are illustrated on river data with known causal ground truth.

Paper Structure

This paper contains 38 sections, 15 theorems, 156 equations, 9 figures, 2 algorithms.

Key Result

Theorem 1

Let $\mathbf{X}$ satisfy Assumption ass_main. Then, for any $v\in V\setminus\{1\}$, the extremal structure function is homogeneous, that is, $\Psi_v( \mathbf{x} + s \mathbf{1}, e) = \Psi_v( \mathbf{x}, e) + s$ for all $s\in\mathbb R$. Moreover, the distribution of $\mathbf{X}^*$ is multivariate regu for any Borel subset $A\subset \mathcal{L}^1$ with $\mathbb P(\mathbf{Y}^{1} \in \partial A) = 0$.

Figures (9)

  • Figure 1: Scatter plots of $X_3$ and $X_4$ simulated according to the SCMs in Example \ref{['ex:tail']} (left) and Example \ref{['ex_different_eSCM']} (right) on graph $G$ in Figure \ref{['subfig:b']}. Blue lines represent an extremal intervention on $X_3$ and samples from the interventional distribution. Left: the data exhibits an extremal causal effect from $X_3$ to $X_4$ and the limiting extremal graph satisfies $G_e = G$. Right: the corresponding extremal graph $G_e$ is the DAG in Figure \ref{['subfig:c']}, which is different from $G$ since $X_3$ is not an extremal cause of $X_4$; extremal interventions on $X_3$ do not lead to an extreme outcome in $X_4$.
  • Figure 2: Examples of directed acyclic graphs on the nodes $V = \{1,\dots, 4\}$: a DAG with two root nodes (left); the diamond graph (center); a directed tree (right).
  • Figure 3: Data from from the models in Example \ref{['ex:normal']} (left) and Example \ref{['ex:exp']} (right) sampled from the observational distribution (black) and the distribution after an extremal intervention on $X^*_1$ (blue).
  • Figure 4: Examples of three different configuration of extremal interventions on the extremal DAG $G_e$ in Figure \ref{['fig:examples']}(b). The hammer indicates the variables on which interventions are performed.
  • Figure 5: Percentage of estimated p-values below a significance level $\alpha \in (0, 1)$ under the null hypothesis $H_0 : \rho_{i, j \mid S} = 0$ (blue curves) and the alternative $H_1: \rho_{i, j \mid S} \neq 0$ (orange curves) for different methods and threshold level $\tau \in \{0.9, 0.95, 0.975\}$ using both the average (solid line) and random (dashed line) tests, as described in the main text.
  • ...and 4 more figures

Theorems & Definitions (62)

  • Definition 1
  • Example 1
  • Example 2
  • Example 3
  • Example 4
  • Example 5
  • Definition 2
  • Example 6
  • Theorem 1
  • Remark 1
  • ...and 52 more