Table of Contents
Fetching ...

Learning Causal Abstractions of Linear Structural Causal Models

Riccardo Massidda, Sara Magliacane, Davide Bacciu

TL;DR

This work develops a comprehensive theory of linear causal abstractions between Structural Causal Models under a linear transformation ${\boldsymbol{T}}$, establishing necessary and sufficient conditions that relate abstract edges to concrete paths and showing how concrete variables arrange into blocks that respect the abstract causal order. It introduces Abs-LiNGAM, a data-efficient method that jointly learns the abstract model, the abstraction function, and a constrained concrete model from observational data with non-Gaussian noise, thereby speeding up large-scale causal discovery. Theoretical results include disjointness of relevant variable sets, block ordering consistency, and a constructive sampling framework for all concretizations; empirically, Abs-LiNGAM reduces search space and computation time while preserving accuracy. The approach offers a scalable pathway for multi-level causal reasoning in settings such as interpretable ML and complex systems, with potential extensions to non-linear settings and relaxation of causal sufficiency.

Abstract

The need for modelling causal knowledge at different levels of granularity arises in several settings. Causal Abstraction provides a framework for formalizing this problem by relating two Structural Causal Models at different levels of detail. Despite increasing interest in applying causal abstraction, e.g. in the interpretability of large machine learning models, the graphical and parametrical conditions under which a causal model can abstract another are not known. Furthermore, learning causal abstractions from data is still an open problem. In this work, we tackle both issues for linear causal models with linear abstraction functions. First, we characterize how the low-level coefficients and the abstraction function determine the high-level coefficients and how the high-level model constrains the causal ordering of low-level variables. Then, we apply our theoretical results to learn high-level and low-level causal models and their abstraction function from observational data. In particular, we introduce Abs-LiNGAM, a method that leverages the constraints induced by the learned high-level model and the abstraction function to speedup the recovery of the larger low-level model, under the assumption of non-Gaussian noise terms. In simulated settings, we show the effectiveness of learning causal abstractions from data and the potential of our method in improving scalability of causal discovery.

Learning Causal Abstractions of Linear Structural Causal Models

TL;DR

This work develops a comprehensive theory of linear causal abstractions between Structural Causal Models under a linear transformation , establishing necessary and sufficient conditions that relate abstract edges to concrete paths and showing how concrete variables arrange into blocks that respect the abstract causal order. It introduces Abs-LiNGAM, a data-efficient method that jointly learns the abstract model, the abstraction function, and a constrained concrete model from observational data with non-Gaussian noise, thereby speeding up large-scale causal discovery. Theoretical results include disjointness of relevant variable sets, block ordering consistency, and a constructive sampling framework for all concretizations; empirically, Abs-LiNGAM reduces search space and computation time while preserving accuracy. The approach offers a scalable pathway for multi-level causal reasoning in settings such as interpretable ML and complex systems, with potential extensions to non-linear settings and relaxation of causal sufficiency.

Abstract

The need for modelling causal knowledge at different levels of granularity arises in several settings. Causal Abstraction provides a framework for formalizing this problem by relating two Structural Causal Models at different levels of detail. Despite increasing interest in applying causal abstraction, e.g. in the interpretability of large machine learning models, the graphical and parametrical conditions under which a causal model can abstract another are not known. Furthermore, learning causal abstractions from data is still an open problem. In this work, we tackle both issues for linear causal models with linear abstraction functions. First, we characterize how the low-level coefficients and the abstraction function determine the high-level coefficients and how the high-level model constrains the causal ordering of low-level variables. Then, we apply our theoretical results to learn high-level and low-level causal models and their abstraction function from observational data. In particular, we introduce Abs-LiNGAM, a method that leverages the constraints induced by the learned high-level model and the abstraction function to speedup the recovery of the larger low-level model, under the assumption of non-Gaussian noise terms. In simulated settings, we show the effectiveness of learning causal abstractions from data and the potential of our method in improving scalability of causal discovery.
Paper Structure (54 sections, 13 theorems, 93 equations, 19 figures, 3 tables, 2 algorithms)

This paper contains 54 sections, 13 theorems, 93 equations, 19 figures, 3 tables, 2 algorithms.

Key Result

Lemma 1

Let $\mathcal{H}$ be a ${\boldsymbol{\mathbf{T}}}$-abstraction of $\mathcal{L}$, where $\mathcal{H}$ and $\mathcal{L}$ are two linear SCMs respectively on variables $\bm{{Y}}$ and $\bm{{X}}$. Then, for any pair of distinct abstract variables $Y_1, Y_2\in\bm{{Y}}$, it holds that $\Pi_R(Y_1)\cap\Pi_R(

Figures (19)

  • Figure 1: An overview of our contributions: (a.) A linear SCM $\mathcal{H}$, representing the abstract causal model, is a ${\boldsymbol{\mathbf{T}}}$-abstraction of a linear SCM $\mathcal{L}$, representing the concrete causal model, whenever the linear transformation ${\boldsymbol{\mathbf{T}}}$ from concrete to abstract variables is interventionally consistent, i.e., whenever it relates both values and interventions on the abstract model and the concrete model. We prove that, for each abstract variable $Y$, the transformation ${\boldsymbol{\mathbf{T}}}$ induces a block $\Pi(Y)$ of concrete causal variables that necessarily follows the causal ordering of the abstract model and whose parameters are constrained by the abstract coefficients. For each block, the abstraction function depends on a possibly smaller subset of relevant variables, which we portray as dashed. (b.) We propose Abs-LiNGAM, a method to speedup the causal discovery of the concrete model $\mathcal{L}$ given an additional dataset $\mathcal{D}_{J}$ sampled from the joint distribution of the abstract and the concrete model. In order, Abs-LiNGAM (i.) reconstructs the transformation ${\boldsymbol{\mathbf{T}}}$, (ii.) fits the abstract model by abstracting the concrete dataset $\mathcal{D}_{\mathcal{L}}$, (iii.) infers a set of constraints ${\boldsymbol{\mathbf{K}}}$ on which paths cannot exist in the concrete graph, and finally (iv.) discovers the concrete model in a search space reduced by the constraints.
  • Figure 2: We report the performance of Abs-LiNGAM for (a) an increasing number of paired samples $|\mathcal{D}_J|$ and (b) an increasing number of concrete nodes $|\bm{{X}}|$ . We plot a variant of Abs-LiNGAM where we bootstrap the abstract causal discovery step with five repetitions. We report the area under the ROC curve and the execution time over 30 runs on randomly generated Erdős-Rényi abstract graphs with $b=5$ nodes and 8 edges. In the first experiment, we sample for each abstract graph a concrete model with random size $|\bm{{X}}| \in [25, 50]$. In the second experiment, we also vary the number of paired samples to always be twice the number of concrete nodes.
  • Figure 3: Results of Abs-LiNGAM over pairs of abstract ($b=5$ nodes) and concrete ($d\in[25,50]$ nodes) linear SCMs after perturbing the abstract observations with normal noise of increasing variance $\sigma^2$. We denote as "Top-1" the strategy where we force the selection of at most a single abstract variable per concrete one and as "Top-1-Refit" the one where we then refit each abstraction vector. All results are averaged over 30 independent runs with $|\mathcal{D}_{\mathcal{L}}| = 20000$ concrete samples and $|\mathcal{D}_{\mathcal{J}}| = 150$ paired samples.
  • Figure 4: Visualization of a pair of concrete-abstract models and their abstraction function. The abstract graph has 5 nodes and 8 edges while the concrete has 5 blocks of random size from $[5, 10]$, with an additional block for the ignored variables.
  • Figure 5: Results of Abs-LiNGAM over pairs of abstract ($b=5$ nodes) and concrete ($d\in[25,50]$ nodes) linear SCMs. In all subfigures we plot the results for an increasing number of paired samples $\mathcal{D}_J$ and we report the average size of the concrete graphs as a vertical dashed line. Abs-LiNGAM-GT denotes a ground truth oracle where the abstraction function and the abstract model are given. The first plot (top left) shows the ROC-AUC of the retrieved concrete causal model $\hat{\mathcal{L}}$. The second plot (top right) shows the execution time required to retrieve the concrete causal model. The third and fourth plots (bottom) show the precision and recall of the prior knowledge inferred by the learned abstraction function $\hat{{\boldsymbol{\mathbf{T}}}}$ and the consequent abstract model $\hat{\mathcal{H}}$. All results are averaged over 30 independent runs with $|\mathcal{D}_{\mathcal{L}}| = 15000$ concrete samples.
  • ...and 14 more figures

Theorems & Definitions (50)

  • Definition 1: ${\boldsymbol{\mathbf{T}}}$-Abstraction
  • Definition 2: Relevant Variables
  • Lemma 1: Disjoint Relevant
  • proof
  • Corollary 1: Constructive Abstraction
  • proof
  • Definition 3: ${\boldsymbol{\mathbf{T}}}$-direct Path
  • Lemma 2: Sufficient Abstract Connectivity
  • proof
  • Corollary 2: Sufficient Directed Paths
  • ...and 40 more