Causal Component Analysis

Liang Wendong; Armin Kekić; Julius von Kügelgen; Simon Buchholz; Michel Besserve; Luigi Gresele; Bernhard Schölkopf

Causal Component Analysis

Liang Wendong, Armin Kekić, Julius von Kügelgen, Simon Buchholz, Michel Besserve, Luigi Gresele, Bernhard Schölkopf

TL;DR

This work introduces Causal Component Analysis (CauCA), an intermediate framework between Independent Component Analysis and Causal Representation Learning that assumes a known latent causal graph and seeks to recover the nonlinear unmixing map and causal mechanisms from interventional data. The authors develop identifiability theory for CauCA under single-node and multi-node interventions, showing how interventional structure constrains ambiguities and reduces possible ground-truth recoveries, with stronger guarantees when interventions are perfect and targeted. They also connect CauCA to nonlinear ICA, deriving new identifiability results requiring fewer datasets, and propose a likelihood-based estimation approach using normalizing flows to recover both the unmixing function and latent causal mechanisms. Extensive synthetic experiments validate the method in both CauCA and ICA settings, demonstrating accurate latent recovery and clarifying how causal structure and interventions influence identifiability. Overall, CauCA provides a principled route to leveraging domain knowledge of latent causal graphs to achieve identifiability in nonlinear, nonparametric settings, with practical learning via flexible flow-based estimators. Key mathematical constructs include X = f(Z), where Z follows regime-specific distributions P^k conditioned on interventions τ_k, and identifiability up to indeterminacy sets S (e.g., S_{ar{G}} or S_{ ext{scaling}}) under interventional discrepancy assumptions, all framed within causal Bayesian networks and their interventional regimes.

Abstract

Independent Component Analysis (ICA) aims to recover independent latent variables from observed mixtures thereof. Causal Representation Learning (CRL) aims instead to infer causally related (thus often statistically dependent) latent variables, together with the unknown graph encoding their causal relationships. We introduce an intermediate problem termed Causal Component Analysis (CauCA). CauCA can be viewed as a generalization of ICA, modelling the causal dependence among the latent components, and as a special case of CRL. In contrast to CRL, it presupposes knowledge of the causal graph, focusing solely on learning the unmixing function and the causal mechanisms. Any impossibility results regarding the recovery of the ground truth in CauCA also apply for CRL, while possibility results may serve as a stepping stone for extensions to CRL. We characterize CauCA identifiability from multiple datasets generated through different types of interventions on the latent causal variables. As a corollary, this interventional perspective also leads to new identifiability results for nonlinear ICA -- a special case of CauCA with an empty graph -- requiring strictly fewer datasets than previous results. We introduce a likelihood-based approach using normalizing flows to estimate both the unmixing function and the causal mechanisms, and demonstrate its effectiveness through extensive synthetic experiments in the CauCA and ICA setting.

Causal Component Analysis

TL;DR

Abstract

Paper Structure (39 sections, 27 theorems, 110 equations, 6 figures, 2 tables)

This paper contains 39 sections, 27 theorems, 110 equations, 6 figures, 2 tables.

Introduction
Preliminaries
Problem Setting
Theory
Identifiability of CauCA
Special Case: ICA with stochastic interventions on the latent components
Experiments
Related Work
Discussion
Notations
Proofs
Lemmata
Proof of
Proof of
Proof of
...and 24 more sections

Key Result

Lemma 3.2

For any $(G,\mathbf{f},(\mathbb{P}^k, \tau_k)_{k\in \llbracket 0,K \rrbracket})$ in $(G, \mathcal{F}, \mathcal{P}_{G})$, and for any $\mathbf{h}\in {\mathcal{S}}_{\text{scaling}}$ with there exists a $(G,\mathbf{f} \circ \mathbf{h},(\mathbb{Q}^k, \tau_k)_{k\in \llbracket 0,K \rrbracket})$ in $(G, \mathcal{F}, \mathcal{P}_{G})$ s.t. $\mathbf{f}_*\mathbb{P}^k=(\mathbf{f} \circ \mathbf{h})_*\mathbb{

Figures (6)

Figure 1: Causal Component Analysis (CauCA). We posit that observed variables $\mathbf{X}$ are generated through a nonlinear mapping $\mathbf{f}$, applied to unobserved latent variables $\mathbf{Z}$ which are causally related. The causal structure $G$ of the latent variables is assumed to be known, while the causal mechanisms $\mathbb{P}_i(Z_i~|~\mathbf{Z}_{\text{pa}(i)})$ and the nonlinear mixing function are unknown and to be estimated. (Known or observed quantities are highlighted in red.) CauCA assumes access to multiple datasets $\mathcal{D}_k$ that result from stochastic interventions on the latent variables.
Figure 2: Violation of the Interventional Discrepancy Assumption. The shown distributions constitute a counterexample to identifiability that violates \ref{['assum:basic']} and thus allows for spurious solutions, see \ref{['app:countereg']} for technical details. (Left) Visualisation of the joint distributions of two independent latent components $z_1$ and $z_2$ after no intervention (red), and interventions on $z_1$ (green) and $z_2$ (blue). As can be seen, each distribution reaches the same plateau on some rectangular interval of the domain, coinciding within the red square. (Center/Right) Within the red square where all distributions agree, it is possible to apply a measure preserving automorphism which leaves all distributions unchanged, but non-trivially mixes the latents. The right plot shows a distance-dependent rotation around the centre of the black circle, whereas the middle plot show a reference identity transformation.
Figure 3: We use the "" symbol together with a "times" symbol to represent how many interventions are required by the two assumptions. (Left) (\ref{['thm:CauCA_multi']}) For \ref{['assum:block-ID']}, we need $n_k$ interventions to get block-identification of $\mathbf{z}_{\tau_k}$. (Right) (\ref{['prop:blocktoreparam']}) For the block-variability assumption, we need $2n_k$ to get to elementwise identification up to scaling and permutation.
Figure 4: Experimental results. Figures (a) and (e) present the mean correlation coefficients (MCC) between true and learned latents and log-probability differences between the model and ground truth ($\Delta$ log prob.) for CauCA experiments. Misspecified models assuming a trivial graph ($E(G){=}\varnothing$) and a linear encoder function class are compared. All violin plots show the distribution of outcomes for 10 pairs of CBNs and mixing functions. Figures (c) and (d) display CauCA results with varying numbers of nonlinearities in the mixing function and latent dimension. For the ICA setting, MCC values and log probability differences are illustrated in (b) and (f). Baselines include a misspecified model (linear mixing) and a naive (single-environment) unidentifiable normalizing flow with an independent Gaussian base distribution (labelled i.i.d.). The naive baseline is trained on pooled data without using information about interventions and their targets. Figure (g) shows the median MCC for CauCA and the misspecified baseline ($E(G){=}\varnothing$) as the strength of the linear parameters relative to the exogenous noise in the structural causal model generating the CBN increases. The shaded areas show the range between minimum and maximum values.
Figure 5: Representative cases for the CauCA settings described in \ref{['def:CauCAgeneral']}(i)-(iv). Each row corresponds to a dataset where one perfect intervention is performed on one of the targets: Cloud (C), Sprinkler (S), Rain (R), and Wet grass (W). Each column corresponds to an admissible choice of intervention targets within each of the settings: the discrepancies between the intervention targets $\tau'$(i)-$\tau'$(iv) and the ground truth targets $\tau$ are meant to illustrate different degrees of ignorance about $\tau$ across the settings in \ref{['def:CauCAgeneral']}(i)-(iv). In the known intervention targets setting, which we focused on in the main paper, $\tau'$(i) is the only possible choice: i.e., the targets must be perfectly aligned with the ground truth $\tau$. For the settings with known intervention targets up to graph automorphisms and with matched intervention targets, $\tau'$(ii) and $\tau'$(iii) represent admissible choices: identifiability results can be proved for both settings. We also show that in the setting of totally unknown targets, where $\tau'$(iv) is an admissible choice, CauCA is not identifiable (see \ref{['remark:completely_unknown']}).
...and 1 more figures

Theorems & Definitions (40)

Definition 2.1: Distribution Markov relative to a DAG pearl2009causality
Definition 2.2: CBN
Remark 2.3
Definition 3.1: Latent CBN
Definition 3.2: Identifiability of CauCA
Lemma 3.2
Theorem 4.2
Proposition 4.2
Theorem 4.4
Proposition 4.4
...and 30 more

Causal Component Analysis

TL;DR

Abstract

Causal Component Analysis

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (40)