Causal Representation Learning from Multiple Distributions: A General Setting

Kun Zhang; Shaoan Xie; Ignavier Ng; Yujia Zheng

Causal Representation Learning from Multiple Distributions: A General Setting

Kun Zhang, Shaoan Xie, Ignavier Ng, Yujia Zheng

TL;DR

This work studies causal representation learning in a fully nonparametric setting from multiple distributions, without relying on hard interventions. It shows that, under sparsity of the latent Markov network and sufficient domain-induced changes in causal mechanisms, the latent Markov network and latent variables can be recovered up to well-defined indeterminacies, with the recovered Markov network isomorphic to the true one and latent variables identifiable up to component-wise transformations in some cases. It introduces SAF and SUCF as necessary-sufficient relaxations that tie the latent Markov network to the moralized graph of the latent DAG, enabling a nonparametric bridge from conditional independencies to causal structure. A practical Change Encoding Network, built on a VAE framework with either nonparametric normalizing-flow priors or linear-parametric priors, learns $Z$ and its causal relations from multi-domain data, validated by simulations. The results illuminate identifiability limits in purely observational, nonparametric settings and highlight how domain heterogeneity can enable causal representation learning without interventions.

Abstract

In many problems, the measured variables (e.g., image pixels) are just mathematical functions of the latent causal variables (e.g., the underlying concepts or objects). For the purpose of making predictions in changing environments or making proper changes to the system, it is helpful to recover the latent causal variables $Z_i$ and their causal relations represented by graph $\mathcal{G}_Z$. This problem has recently been known as causal representation learning. This paper is concerned with a general, completely nonparametric setting of causal representation learning from multiple distributions (arising from heterogeneous data or nonstationary time series), without assuming hard interventions behind distribution changes. We aim to develop general solutions in this fundamental case; as a by product, this helps see the unique benefit offered by other assumptions such as parametric causal models or hard interventions. We show that under the sparsity constraint on the recovered graph over the latent variables and suitable sufficient change conditions on the causal influences, interestingly, one can recover the moralized graph of the underlying directed acyclic graph, and the recovered latent variables and their relations are related to the underlying causal model in a specific, nontrivial way. In some cases, most latent variables can even be recovered up to component-wise transformations. Experimental results verify our theoretical claims.

Causal Representation Learning from Multiple Distributions: A General Setting

TL;DR

and its causal relations from multi-domain data, validated by simulations. The results illuminate identifiability limits in purely observational, nonparametric settings and highlight how domain heterogeneity can enable causal representation learning without interventions.

Abstract

and their causal relations represented by graph

. This problem has recently been known as causal representation learning. This paper is concerned with a general, completely nonparametric setting of causal representation learning from multiple distributions (arising from heterogeneous data or nonstationary time series), without assuming hard interventions behind distribution changes. We aim to develop general solutions in this fundamental case; as a by product, this helps see the unique benefit offered by other assumptions such as parametric causal models or hard interventions. We show that under the sparsity constraint on the recovered graph over the latent variables and suitable sufficient change conditions on the causal influences, interestingly, one can recover the moralized graph of the underlying directed acyclic graph, and the recovered latent variables and their relations are related to the underlying causal model in a specific, nontrivial way. In some cases, most latent variables can even be recovered up to component-wise transformations. Experimental results verify our theoretical claims.

Paper Structure (22 sections, 21 theorems, 55 equations, 4 figures)

This paper contains 22 sections, 21 theorems, 55 equations, 4 figures.

Introduction
Problem Setting
Learning Causal Representations from Multiple Distributions
Recovering Latent Causal Variables and Latent Markov Network
From Latent Markov Network to Latent Causal DAG
Change Encoding Network for Representation Learning
Nonparametric Implementation of the Prior Distribution
Parametric Implementation of the Prior Distribution
Full Objective
Simulations
Related Work
Conclusion and Discussions
Proofs of Useful Lemmas
Proof of \ref{['lemma:nonzero_diagonal_entries']}
Proof of \ref{['lemma:zero_submatrix']}
...and 7 more sections

Key Result

Proposition 1

Let the observations be sampled from the data generating process in Eq. (eq:data_generating_process), and $\mathcal{M}_Z$ be the Markov network over $Z$. Suppose the following assumptions hold: Suppose that we learn $(\hat{g}, \hat{f},p_{\hat{Z}},\hat{\Theta})$ to achieve Eq. (eq:matched_distribution). Then, for every pair of estimated latent variables $\hat{Z}_k$ and $\hat{Z}_l$ that are not ad

Figures (4)

Figure 1: The generating process for each latent causal variable $Z_i$ changes, governed by a latent factor $\theta_i$. The observed variables $X$ are generated by $X = g(Z)$ with a nonlinear mixing function $g$.
Figure 2: Illustrative example 2.
Figure 3: Recovered latent variables v.s. the true latent variables with Non-Parametric Approach. (a) Y-structure with Laplace noise. (b) Y-structure with Gaussian noise. (c) Chain structure with Laplace noise. (d) Chain structure with Gaussian noise. In each sub-figure, $i$-th row and $j$-th column depcits the relationship between the estimated $\hat{Z}_i$ and the true components $Z_j$.
Figure 4: Recovered latent variables v.s. the true latent variables with Linear Parameterization Approach. The $X$-axis denotes the components of true latent variables $Z$ and the $Y$-axis represent the components of estimated latent variables $\hat{Z}$. (a) Y-structure with Laplace noise. (b) Y-structure with Gaussian noise. (c) Chain structure with Laplace noise. (d) Chain structure with Gaussian noise.

Theorems & Definitions (37)

Proposition 1
Theorem 1: Relations among true and recovered latent causal variables
Theorem 2: Identifiability of latent Markov network
Theorem 3: Identifiability of latent causal variables
Remark 1
Example 1
Example 2
Corollary 1: Impossibility of finding independent components
Lemma 1
Proposition 2: Moralized graph and Markov network
...and 27 more

Causal Representation Learning from Multiple Distributions: A General Setting

TL;DR

Abstract

Causal Representation Learning from Multiple Distributions: A General Setting

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (37)