Table of Contents
Fetching ...

Structural Causal Bottleneck Models

Simon Bing, Jonas Wahl, Jakob Runge

TL;DR

This work argues that SCBMs provide an alternative to existing causal dimension reduction frameworks like causal representation learning or causal abstraction learning, and analyses identifiability in SCBMs, connects them to information bottlenecks in the sense of Tishby&Zaslavsky (2015), and illustrates how to estimate them experimentally.

Abstract

We introduce structural causal bottleneck models (SCBMs), a novel class of structural causal models. At the core of SCBMs lies the assumption that causal effects between high-dimensional variables only depend on low-dimensional summary statistics, or bottlenecks, of the causes. SCBMs provide a flexible framework for task-specific dimension reduction while being estimable via standard, simple learning algorithms in practice. We analyse identifiability in SCBMs, connect them to information bottlenecks in the sense of Tishby & Zaslavsky (2015), and illustrate how to estimate them experimentally. We also demonstrate the benefit of bottlenecks for effect estimation in low-sample transfer learning settings. We argue that SCBMs provide an alternative to existing causal dimension reduction frameworks like causal representation learning or causal abstraction learning.

Structural Causal Bottleneck Models

TL;DR

This work argues that SCBMs provide an alternative to existing causal dimension reduction frameworks like causal representation learning or causal abstraction learning, and analyses identifiability in SCBMs, connects them to information bottlenecks in the sense of Tishby&Zaslavsky (2015), and illustrates how to estimate them experimentally.

Abstract

We introduce structural causal bottleneck models (SCBMs), a novel class of structural causal models. At the core of SCBMs lies the assumption that causal effects between high-dimensional variables only depend on low-dimensional summary statistics, or bottlenecks, of the causes. SCBMs provide a flexible framework for task-specific dimension reduction while being estimable via standard, simple learning algorithms in practice. We analyse identifiability in SCBMs, connect them to information bottlenecks in the sense of Tishby & Zaslavsky (2015), and illustrate how to estimate them experimentally. We also demonstrate the benefit of bottlenecks for effect estimation in low-sample transfer learning settings. We argue that SCBMs provide an alternative to existing causal dimension reduction frameworks like causal representation learning or causal abstraction learning.
Paper Structure (42 sections, 2 theorems, 19 equations, 7 figures, 2 tables, 1 algorithm)

This paper contains 42 sections, 2 theorems, 19 equations, 7 figures, 2 tables, 1 algorithm.

Key Result

Lemma 4.1

Let $\mathfrak{C} = \langle \mathcal{G}, \mathcal{X}, \mathcal{Z}, \boldsymbol{\eta}, \mathcal{B},\mathcal{F}, \mathbf{X} \rangle$ be an SCBM. Assume that there are invertible maps $\psi_j: \mathcal{Z}_j \to \mathcal{Z}'_j$ for every endogeneous node $j \in \mathcal{V}_{end}$ to some other spaces $\

Figures (7)

  • Figure 1: Examples of (a) factored bottleneck and effect functions and (b) intrinsic bottlenecks.
  • Figure 2: Left: Results of the identifiability experiments across various settings. We report the mean average $R^2$ along with its standard deviation. The top row shows results for linear SCBMs and the bottom row for nonlinear SCBMs. High average $R^2$ scores across models and settings indicate that we successfully learn the bottleneck variables up to a bijection. Right: Visualization of learned bottleneck spaces $\hat{\mathcal{Z}}$ w.r.t. to the ground-truth space $\mathcal{Z}$. For both linear and nonlinear cases, $\hat{\mathcal{Z}}$ corresponds to $\mathcal{Z}$ up to a (linear or nonlinear) bijection.
  • Figure 3: Mean average $R^2$ and 95% confidence interval of identifying bottlenecks with misspecified assumed bottleneck dimension $d_{\hat{\mathbf{Z}}}$. The dashed vertical line indicates the ground-truth bottleneck dimension. For both linear and nonlinear settings, the metric increase until it saturates at $d_{\hat{\mathbf{Z}}} = d_{\mathbf{Z}}$, indicating that the true bottleneck dimension is a lower bound for identifiability.
  • Figure 4: Graph of the SCBM used for the transfer learning experiments. We assume that samples from the environment $e_1$, where all variables are jointly observed, are relatively scarce compared to the number of samples of environment $e_2$, where we only jointly observe $\mathbf{X}_1$ and $\mathbf{X}_3$.
  • Figure 5: Mean absolute error (MAE) and $95\%$ confidence interval of estimating the effect $\mathbf{X}_1 \rightarrow \mathbf{X}_2$ using different conditioning variables. For both linear and nonlinear SCBMs, using the estimated bottleneck variable is beneficial for small samples sizes.
  • ...and 2 more figures

Theorems & Definitions (7)

  • Definition 2.1
  • Definition 2.2
  • Definition 2.4: Intrinsic bottlenecks
  • Lemma 4.1
  • Lemma 4.2
  • proof : Proof of Lemma \ref{['lem.ident1']}
  • proof : Proof of Lemma \ref{['lem.converse_direction']}