A Sparsity Principle for Partially Observable Causal Representation Learning

Danru Xu; Dingling Yao; Sébastien Lachapelle; Perouz Taslakian; Julius von Kügelgen; Francesco Locatello; Sara Magliacane

A Sparsity Principle for Partially Observable Causal Representation Learning

Danru Xu, Dingling Yao, Sébastien Lachapelle, Perouz Taslakian, Julius von Kügelgen, Francesco Locatello, Sara Magliacane

TL;DR

This work tackles causal representation learning when observations are partially informative about latent variables, introducing an Unpaired Partial Observation framework. It proves identifiability under a sparsity principle for both linear and piecewise-linear mixing: linear mixing yields exact recovery up to permutation and diagonal scaling with a zero-reconstruction constraint, while piecewise-linear mixing with Gaussian latents and group-aware Gaussianity constraints achieves the same identifiability. It then implements two estimation methods leveraging these theories, substituting $\ell_0$ sparsity with $\ell_1$ penalties and adding Gaussianity regularizers, and validates them across numerical simulations and image-based benchmarks (e.g., Multiple Balls and PartialCausal3DIdent). The results show robust latent-recovery performance under varying partial observability patterns, demonstrating practical potential for robust, interpretable CRL in settings with occlusions or missing data. Limitations include reliance on known groupings and Gaussianity in the nonlinear setting, motivating future work to extend identifiability to broader nonlinear regimes and weaker observability assumptions.

Abstract

Causal representation learning aims at identifying high-level causal variables from perceptual data. Most methods assume that all latent causal variables are captured in the high-dimensional observations. We instead consider a partially observed setting, in which each measurement only provides information about a subset of the underlying causal state. Prior work has studied this setting with multiple domains or views, each depending on a fixed subset of latents. Here, we focus on learning from unpaired observations from a dataset with an instance-dependent partial observability pattern. Our main contribution is to establish two identifiability results for this setting: one for linear mixing functions without parametric assumptions on the underlying causal model, and one for piecewise linear mixing functions with Gaussian latent causal variables. Based on these insights, we propose two methods for estimating the underlying causal variables by enforcing sparsity in the inferred representation. Experiments on different simulated datasets and established benchmarks highlight the effectiveness of our approach in recovering the ground-truth latents.

A Sparsity Principle for Partially Observable Causal Representation Learning

TL;DR

sparsity with

penalties and adding Gaussianity regularizers, and validates them across numerical simulations and image-based benchmarks (e.g., Multiple Balls and PartialCausal3DIdent). The results show robust latent-recovery performance under varying partial observability patterns, demonstrating practical potential for robust, interpretable CRL in settings with occlusions or missing data. Limitations include reliance on known groupings and Gaussianity in the nonlinear setting, motivating future work to extend identifiability to broader nonlinear regimes and weaker observability assumptions.

Abstract

Paper Structure (47 sections, 15 theorems, 60 equations, 32 figures, 16 tables)

This paper contains 47 sections, 15 theorems, 60 equations, 32 figures, 16 tables.

Introduction
Problem setting
Causal variables $\mathbf{C}$.
Mask variables $\mathbf{Y}$.
Masked causal variables $\mathbf{Z}$.
Observations $\mathbf{X}$.
Identifiability via a Sparsity Principle
Linear Mixing Function
Is sparsity enough for identification for nonlinear $\mathbf{f}$?
Piecewise Linear Mixing Function
Non-zero mask values.
Implementation
Experimental Results
Numerical Experiments
Results for linear mixing function (Thm. \ref{['thm: linear disentangle']}).
...and 32 more sections

Key Result

Theorem 3.1

Assume the observation $\mathbf{X} = \mathbf{f}(\mathbf{Z})$ follows the data-generating process in Sec. sec: Problem set up, where $\mathbf{f}: \mathcal{Z} \to \mathcal{X}$ is an injective linear function, and Ass. assump: sufficient support holds. Let $\mathbf{g}: \mathcal{X}\rightarrow \mathbb{R} then $\mathbf{Z}$ is identified by $\hat{\mathbf{f}}^{-1} (\mathbf{X})$ up to a permutation and ele

Figures (32)

Figure 1: (a) Motivating example for the Unpaired Partial Observation setting: a stationary camera taking pictures of a car park. We consider $\mathbf{x}^1$ the image on day 1 and $\mathbf{x}^2$ the image on day 2. The latent causal variables $\mathbf{c}^1$ and $\mathbf{c}^2$ represent the positions of four cars on each day. In $\mathbf{x}^1$ only $Car2$ and $Car3$ are visible, while in $\mathbf{x}^2$ all cars except $Car3$ are visible. This is represented by the ones in the binary mask variables $\mathbf{y}^1$ and $\mathbf{y}^2$. The combination of the values of the latent causal variables $\mathbf{c}$ and the masks $\mathbf{y}$ are the masked causal variables$\mathbf{z}$, which used by the mixing function $\mathbf{f}$ to generate the images $\mathbf{x}$. (b) Causal model of the setting, the dotted line variables are not directly observed, but they are measured only through the observation $\mathbf{X}$. Our goal is to learn a representation $\hat{\mathbf{Z}}$ that identifies $\mathbf{Z}$ up to permutation and element-wise transformation.
Figure 2: Results for different parameters in the piecewise linear numerical experiments. Our method implements Eq. \ref{['equ: loss_pw']} based on the group information. The oracle method implements the same loss, but with as additional information the mask $\mathbf{y}_i$, which it uses to assign a low variance to the masked variables in each sample for the skewness and kurtosis regularization terms. This method showcases the potential of our theoretical results with a stronger Gaussianity constraint.
Figure 3: Comparison between (a) masking on $C_2$, and (b) do intervention on $C_2$. In the second case, there is an effect on $C_3$, and the intervention cuts the link and hence the dependence between $C_1$ and $C_3$.
Figure 4: Level curves of the function $\hat{\mathbf{f}}^{-1} \circ \mathbf{f}$ of Example \ref{['ex:counter_example1']}. The cold color scheme corresponds to the level curves of $(\hat{\mathbf{f}}^{-1} \circ \mathbf{f})_1(\mathbf{z})$ while the warm color scheme corresponds to $(\hat{\mathbf{f}}^{-1} \circ \mathbf{f})_2(\mathbf{z})$. The example gives a concrete case where all assumptions of Theorem \ref{['thm: linear disentangle']} hold except for the linearity of $\mathbf{f}$. We can see that $\hat{\mathbf{f}}^{-1} \circ \mathbf{f}$ is not a permutation composed with an element-wise invertible transformation, since along the vertical dashed line, we can see that both components of $\hat{\mathbf{f}}^{-1} \circ \mathbf{f}$ change.
Figure 5: Sensitivity analysis on $\epsilon$ varies from $1e-4$ to $1e4$. The left graph is for the linear case, and right-hand side is for the piecewise linear case.
...and 27 more figures

Theorems & Definitions (31)

Definition 2.1
Theorem 3.1: Element-wise Identifiability for Linear $\fb$
Example 3.1
Theorem 3.4: Element-wise Identifiability for Piecewise Linear $\fb$
Definition 2.1
Lemma 2.2: Existence of permutation $\pi$ s.t. $i \in N_{\pi(i)}$
proof
Lemma 2.3: Element-wise Identifiability for Linear Transformation
proof
Theorem 2.3: Element-wise Identifiability for Linear $\fb$
...and 21 more

A Sparsity Principle for Partially Observable Causal Representation Learning

TL;DR

Abstract

A Sparsity Principle for Partially Observable Causal Representation Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (32)

Theorems & Definitions (31)