Extremes of structural causal models
Sebastian Engelke, Nicola Gnecco, Frank Röttger
TL;DR
This work develops a theory of extremes for structural causal models, showing that tail behavior is captured by a multivariate Pareto distribution on an extremal DAG $G_e$, a subgraph of the original causal graph $G$. It introduces directed extremal graphical models and proves Markov properties for the tail limit, connecting extremal SCMs to tail graphical structures. Two structure-learning approaches are proposed—extremal PC and extremal pruning—each leveraging an extremal conditional independence test to recover $G_e$ from data, with consistency results and a river network application. The framework enables causal inference and extrapolation in the distributional tail, revealing when causal links vanish under extremal interventions and providing practical tools for tail-based causal discovery.
Abstract
The behavior of extreme observations is well-understood for time series or spatial data, but little is known if the data generating process is a structural causal model (SCM). We study the behavior of extremes in this model class, both for the observational distribution and under extremal interventions. We show that under suitable regularity conditions on the structure functions, the extremal behavior is described by a multivariate Pareto distribution, which can be represented as a new SCM on an extremal graph. Importantly, the latter is a sub-graph of the graph in the original SCM, which means that causal links can disappear in the tails. We further introduce a directed version of extremal graphical models and show that an extremal SCM satisfies the corresponding Markov properties. Based on a new test of extremal conditional independence, we propose two algorithms for learning the extremal causal structure from data. The first is an extremal version of the PC-algorithm, and the second is a pruning algorithm that removes edges from the original graph to consistently recover the extremal graph. The methods are illustrated on river data with known causal ground truth.
