Table of Contents
Fetching ...

Quantifying intrinsic causal contributions via structure preserving interventions

Dominik Janzing, Patrick Blöbaum, Atalanti A. Mastakouri, Philipp M. Faller, Lenon Minorics, Kailash Budhathoki

TL;DR

This work proposes a notion of causal influence that describes the `intrinsic' part of the contribution of a node on a target node in a DAG by recursively writing each node as a function of the upstream noise terms, and describes its contribution analysis for variance and entropy.

Abstract

We propose a notion of causal influence that describes the `intrinsic' part of the contribution of a node on a target node in a DAG. By recursively writing each node as a function of the upstream noise terms, we separate the intrinsic information added by each node from the one obtained from its ancestors. To interpret the intrinsic information as a {\it causal} contribution, we consider `structure-preserving interventions' that randomize each node in a way that mimics the usual dependence on the parents and does not perturb the observed joint distribution. To get a measure that is invariant with respect to relabelling nodes we use Shapley based symmetrization and show that it reduces in the linear case to simple ANOVA after resolving the target node into noise variables. We describe our contribution analysis for variance and entropy, but contributions for other target metrics can be defined analogously. The code is available in the package gcm of the open source library DoWhy.

Quantifying intrinsic causal contributions via structure preserving interventions

TL;DR

This work proposes a notion of causal influence that describes the `intrinsic' part of the contribution of a node on a target node in a DAG by recursively writing each node as a function of the upstream noise terms, and describes its contribution analysis for variance and entropy.

Abstract

We propose a notion of causal influence that describes the `intrinsic' part of the contribution of a node on a target node in a DAG. By recursively writing each node as a function of the upstream noise terms, we separate the intrinsic information added by each node from the one obtained from its ancestors. To interpret the intrinsic information as a {\it causal} contribution, we consider `structure-preserving interventions' that randomize each node in a way that mimics the usual dependence on the parents and does not perturb the observed joint distribution. To get a measure that is invariant with respect to relabelling nodes we use Shapley based symmetrization and show that it reduces in the linear case to simple ANOVA after resolving the target node into noise variables. We describe our contribution analysis for variance and entropy, but contributions for other target metrics can be defined analogously. The code is available in the package gcm of the open source library DoWhy.

Paper Structure

This paper contains 38 sections, 2 theorems, 54 equations, 3 figures.

Key Result

Lemma 1

Let $N_1,\dots,N_n$ be noise variables of an SCM ${\cal M}$ with observed nodes $X_1,\dots,X_n$. Let $\tilde{{\cal M}}$ be a modified SCM with observed variables $X_1,\dots,X_{n+k}$ and noise variables $N_1,\dots,N_{n+k}$ modeling the same joint distribution on $X_1,\dots,X_n,N_1,\dots,N_n$. Assume for all $T \subset \{1,\dots,n\}$. Then ${\cal M}$ and $\tilde{{\cal M}}$ yield the same values for

Figures (3)

  • Figure 1: Left: Causal DAG for which it is already non-trivial to define the strength of the influence of $X_1$ on $X_3$ -- if one demands that this definition should also apply to the limiting case on the right (where the edge $X_1\to X_2$ disappeared).
  • Figure 2: Left: location of the $5$ stations at which water flows are recorded. Middle: Causal model for the flows with Weather as latent confounder. Right: ICC with bootstrap confidence bounds for each of the $4$ upstream stations and the target station (Samlesbury) itself.
  • Figure 3: Causal DAG of the AUTO MPG dataset for factors influencing fuel consumption with cyl: Cylinders, dis: displacement, hp: horsepower, wgt: weight, mpg: miles per gallon

Theorems & Definitions (10)

  • Definition 1: Structural Causal Model (SCM)
  • Definition 2: intrinsic causal contribution (ICC)
  • Example 1: conditional Shannon entropy
  • Example 2: expected conditional variance
  • Definition 3: Shapley ICC
  • Example 3: linear SCMs
  • Definition 4: Shapley values
  • Lemma 1: dummy noise variables
  • Lemma 2: adding zero value players
  • Definition 5: Single arrow post-cutting distribution