Table of Contents
Fetching ...

Implicit Causal Representation Learning via Switchable Mechanisms

Shayan Shirahmad Gale Bagi, Zahra Gharaee, Oliver Schulte, Mark Crowley

TL;DR

This work tackles learning implicit causal representations when ground-truth graphs are unavailable, focusing on soft interventions as a realistic setting. It introduces Augmented Implicit Causal Models with a causal mechanism switch variable $V$ and solution functions that model soft intervention effects via $h_i(v)$, enabling identifiability up to reparameterization under suitable assumptions. A formal identifiability theorem shows equivalence classes collapse when soft interventions are observed and decoders are diffeomorphisms, while training relies on a variational ELBO that jointly infers exogenous variables and the intervention switch. Empirically, ICRL-SM demonstrates improved causal disentanglement and action inference on synthetic data and causal-triplet benchmarks (Epic-Kitchens, ProcTHOR) compared to baselines, with stronger gains in sparser graphs and moderate intervention strengths. The results suggest soft-intervention modeling via a switch mechanism is a promising direction for robust, identifiable causal representation learning in real-world settings.

Abstract

Learning causal representations from observational and interventional data in the absence of known ground-truth graph structures necessitates implicit latent causal representation learning. Implicit learning of causal mechanisms typically involves two categories of interventional data: hard and soft interventions. In real-world scenarios, soft interventions are often more realistic than hard interventions, as the latter require fully controlled environments. Unlike hard interventions, which directly force changes in a causal variable, soft interventions exert influence indirectly by affecting the causal mechanism. However, the subtlety of soft interventions impose several challenges for learning causal models. One challenge is that soft intervention's effects are ambiguous, since parental relations remain intact. In this paper, we tackle the challenges of learning causal models using soft interventions while retaining implicit modelling. We propose ICLR-SM, which models the effects of soft interventions by employing a causal mechanism switch variable designed to toggle between different causal mechanisms. In our experiments, we consistently observe improved learning of identifiable, causal representations, compared to baseline approaches.

Implicit Causal Representation Learning via Switchable Mechanisms

TL;DR

This work tackles learning implicit causal representations when ground-truth graphs are unavailable, focusing on soft interventions as a realistic setting. It introduces Augmented Implicit Causal Models with a causal mechanism switch variable and solution functions that model soft intervention effects via , enabling identifiability up to reparameterization under suitable assumptions. A formal identifiability theorem shows equivalence classes collapse when soft interventions are observed and decoders are diffeomorphisms, while training relies on a variational ELBO that jointly infers exogenous variables and the intervention switch. Empirically, ICRL-SM demonstrates improved causal disentanglement and action inference on synthetic data and causal-triplet benchmarks (Epic-Kitchens, ProcTHOR) compared to baselines, with stronger gains in sparser graphs and moderate intervention strengths. The results suggest soft-intervention modeling via a switch mechanism is a promising direction for robust, identifiable causal representation learning in real-world settings.

Abstract

Learning causal representations from observational and interventional data in the absence of known ground-truth graph structures necessitates implicit latent causal representation learning. Implicit learning of causal mechanisms typically involves two categories of interventional data: hard and soft interventions. In real-world scenarios, soft interventions are often more realistic than hard interventions, as the latter require fully controlled environments. Unlike hard interventions, which directly force changes in a causal variable, soft interventions exert influence indirectly by affecting the causal mechanism. However, the subtlety of soft interventions impose several challenges for learning causal models. One challenge is that soft intervention's effects are ambiguous, since parental relations remain intact. In this paper, we tackle the challenges of learning causal models using soft interventions while retaining implicit modelling. We propose ICLR-SM, which models the effects of soft interventions by employing a causal mechanism switch variable designed to toggle between different causal mechanisms. In our experiments, we consistently observe improved learning of identifiable, causal representations, compared to baseline approaches.
Paper Structure (34 sections, 3 theorems, 52 equations, 8 figures, 10 tables)

This paper contains 34 sections, 3 theorems, 52 equations, 8 figures, 10 tables.

Key Result

Theorem 3.5

(Identifiability of latent causal models.) Let $\mathcal{M}=(\mathcal{A}, \mathcal{X}, g, \mathcal{I})$ and $\mathcal{M}'=(\mathcal{A}', \mathcal{X}, g', \mathcal{I})$ be two LCMs with shared observation space $\mathcal{X}$ and shared intervention targets $\mathcal{I}$. Suppose the following conditi Then the following statements are equivalent:

Figures (8)

  • Figure 1: Difference between hard interventions and soft interventions: As seen in the middle row, hard interventions sever connections with parents. Therefore, an object's class cannot have any effect on the object's color when we intervene on color. On the other hand, soft interventions, as shown in the bottom row, allow for such effects.
  • Figure 2: The architecture processes pre-intervention observations $X$, post-intervention observations $\tilde{X}$, and their differences $X - \tilde{X}$ (intervention displacement), encoding them into latent representations. Each encoder outputs the mean (M) and variance (V) of a probability distribution function. By sampling from these distributions, we obtain pre-intervention exogenous variables $E$, post-intervention exogenous variables $\tilde{E}$, and the causal mechanism switch variable $V$. The exogenous variables are derived from the corresponding pre- and post-intervention encodings, while $V$ is obtained from the encoding of the differences between $X$ and $\tilde{X}$. The pre- and post-intervention exogenous variables are then passed through two fully connected (FC) layers, which predict the scale and location parameters. These predicted scale and location parameters, together with the post-intervention exogenous variables and the causal mechanism switch variable, are utilized in the solution function (\ref{['eq:solution']}) to compute the post-intervention causal variables $\tilde{Z}$. Here, $N$ denotes the total number of causal variables.
  • Figure 3: Generative model
  • Figure 4: Causal disentanglement for different number of causal variables.
  • Figure A1: The distribution of observed and causal variables in two causal models $\mathcal{M}$ and $\mathcal{M'}$, which belong to the equivalence class up to reparameterization. (a) There are 10 observed samples in which $Z_1$ or $Z_2$ has been intervened on. (b) The distribution of causal variables when $I=0$ (no intervention) is identical to each other but the range of value of causal variables are different and can be mapped to each other using $\phi_{\mathcal{Z}}$. (c) The intervention on $Z_1$ ($I=1$). (d) The intervention on $Z_2$ ($I=2$). For $I=1$ and $I=2$ the distributions are again identical to each other but are different for different targets of intervention as soft interventions change the conditional distribution (condition on parents) of causal variables. Also, for each value of $I$, the distributions of $\mathcal{M}$ and $\mathcal{M'}$ should move in one direction as targets are known.
  • ...and 3 more figures

Theorems & Definitions (9)

  • Definition 3.2
  • Definition 3.4
  • Theorem 3.5
  • Definition A1.1
  • Definition A1.2
  • Definition A1.3
  • Definition A1.4
  • Lemma A1.5
  • Theorem A1.6