Causal Component Analysis
Liang Wendong, Armin Kekić, Julius von Kügelgen, Simon Buchholz, Michel Besserve, Luigi Gresele, Bernhard Schölkopf
TL;DR
This work introduces Causal Component Analysis (CauCA), an intermediate framework between Independent Component Analysis and Causal Representation Learning that assumes a known latent causal graph and seeks to recover the nonlinear unmixing map and causal mechanisms from interventional data. The authors develop identifiability theory for CauCA under single-node and multi-node interventions, showing how interventional structure constrains ambiguities and reduces possible ground-truth recoveries, with stronger guarantees when interventions are perfect and targeted. They also connect CauCA to nonlinear ICA, deriving new identifiability results requiring fewer datasets, and propose a likelihood-based estimation approach using normalizing flows to recover both the unmixing function and latent causal mechanisms. Extensive synthetic experiments validate the method in both CauCA and ICA settings, demonstrating accurate latent recovery and clarifying how causal structure and interventions influence identifiability. Overall, CauCA provides a principled route to leveraging domain knowledge of latent causal graphs to achieve identifiability in nonlinear, nonparametric settings, with practical learning via flexible flow-based estimators. Key mathematical constructs include X = f(Z), where Z follows regime-specific distributions P^k conditioned on interventions τ_k, and identifiability up to indeterminacy sets S (e.g., S_{ar{G}} or S_{ ext{scaling}}) under interventional discrepancy assumptions, all framed within causal Bayesian networks and their interventional regimes.
Abstract
Independent Component Analysis (ICA) aims to recover independent latent variables from observed mixtures thereof. Causal Representation Learning (CRL) aims instead to infer causally related (thus often statistically dependent) latent variables, together with the unknown graph encoding their causal relationships. We introduce an intermediate problem termed Causal Component Analysis (CauCA). CauCA can be viewed as a generalization of ICA, modelling the causal dependence among the latent components, and as a special case of CRL. In contrast to CRL, it presupposes knowledge of the causal graph, focusing solely on learning the unmixing function and the causal mechanisms. Any impossibility results regarding the recovery of the ground truth in CauCA also apply for CRL, while possibility results may serve as a stepping stone for extensions to CRL. We characterize CauCA identifiability from multiple datasets generated through different types of interventions on the latent causal variables. As a corollary, this interventional perspective also leads to new identifiability results for nonlinear ICA -- a special case of CauCA with an empty graph -- requiring strictly fewer datasets than previous results. We introduce a likelihood-based approach using normalizing flows to estimate both the unmixing function and the causal mechanisms, and demonstrate its effectiveness through extensive synthetic experiments in the CauCA and ICA setting.
