Multi-View Causal Representation Learning with Partial Observability
Dingling Yao, Danru Xu, Sébastien Lachapelle, Sara Magliacane, Perouz Taslakian, Georg Martius, Julius von Kügelgen, Francesco Locatello
TL;DR
This work introduces a general framework for recovering latent content blocks from multiple partially observed views, under nonlinear mixtures and possible causal relationships. Central to the approach are content encoders that align shared content across views while enforcing invertibility through entropy regularization and projection mechanisms, enabling identifiability up to smooth bijections. The authors establish theoretical results (including an identifiability algebra) showing when and how content blocks can be recovered from various subsets of views, and demonstrate broad applicability by unifying prior nonlinear ICA, disentanglement, and causal representation learning results. Empirically, they validate the theory across synthetic and real-world multimodal datasets, showing that multiple blocks of latent content can be learned simultaneously and that prior methods emerge as special cases of the proposed framework. The work highlights the practical potential of leveraging multiple partial views to obtain finer-grained representations, while noting challenges such as non-convex optimization and finite-sample limitations, and pointing to future directions including interventions and causal marginal analysis.
Abstract
We present a unified framework for studying the identifiability of representations learned from simultaneously observed views, such as different data modalities. We allow a partially observed setting in which each view constitutes a nonlinear mixture of a subset of underlying latent variables, which can be causally related. We prove that the information shared across all subsets of any number of views can be learned up to a smooth bijection using contrastive learning and a single encoder per view. We also provide graphical criteria indicating which latent variables can be identified through a simple set of rules, which we refer to as identifiability algebra. Our general framework and theoretical results unify and extend several previous works on multi-view nonlinear ICA, disentanglement, and causal representation learning. We experimentally validate our claims on numerical, image, and multi-modal data sets. Further, we demonstrate that the performance of prior methods is recovered in different special cases of our setup. Overall, we find that access to multiple partial views enables us to identify a more fine-grained representation, under the generally milder assumption of partial observability.
