Table of Contents
Fetching ...

Multi-Domain Causal Empirical Bayes Under Linear Mixing

Bohan Wu, Julius von Kügelgen, David M. Blei

Abstract

Causal representation learning (CRL) aims to learn low-dimensional causal latent variables from high-dimensional observations. While identifiability has been extensively studied for CRL, estimation has been less explored. In this paper, we explore the use of empirical Bayes (EB) to estimate causal representations. In particular, we consider the problem of learning from data from multiple domains, where differences between domains are modeled by interventions in a shared underlying causal model. Multi-domain CRL naturally poses a simultaneous inference problem that EB is designed to tackle. Here, we propose an EB $f$-modeling algorithm that improves the quality of learned causal variables by exploiting invariant structure within and across domains. Specifically, we consider a linear measurement model and interventional priors arising from a shared acyclic SCM. When the graph and intervention targets are known, we develop an EM-style algorithm based on causally structured score matching. We further discuss EB $\rmg$-modeling in the context of existing CRL approaches. In experiments on synthetic data, our proposed method achieves more accurate estimation than other methods for CRL.

Multi-Domain Causal Empirical Bayes Under Linear Mixing

Abstract

Causal representation learning (CRL) aims to learn low-dimensional causal latent variables from high-dimensional observations. While identifiability has been extensively studied for CRL, estimation has been less explored. In this paper, we explore the use of empirical Bayes (EB) to estimate causal representations. In particular, we consider the problem of learning from data from multiple domains, where differences between domains are modeled by interventions in a shared underlying causal model. Multi-domain CRL naturally poses a simultaneous inference problem that EB is designed to tackle. Here, we propose an EB -modeling algorithm that improves the quality of learned causal variables by exploiting invariant structure within and across domains. Specifically, we consider a linear measurement model and interventional priors arising from a shared acyclic SCM. When the graph and intervention targets are known, we develop an EM-style algorithm based on causally structured score matching. We further discuss EB -modeling in the context of existing CRL approaches. In experiments on synthetic data, our proposed method achieves more accurate estimation than other methods for CRL.
Paper Structure (38 sections, 2 theorems, 61 equations, 3 figures, 1 algorithm)

This paper contains 38 sections, 2 theorems, 61 equations, 3 figures, 1 algorithm.

Key Result

Theorem 5.1

For all solutions $s^\star \in\mathop{\mathrm{arg\,min}}\limits_{s\in{\mathcal{S}}}{\bm{L}}^{\mathcal{G}}(s)$ and all $j\in[d_Z]$, the $j$th component $s^\star_j$ of $s^\star$ minimizes

Figures (3)

  • Figure 1: Assumed graphical model. Shaded nodes are observed, white nodes are latent. The inner plate is over realizations $i=1, ...,N_e$, the outer over domains $e=1,...,M$.
  • Figure 2: Example DAG with intervention targets and surrogate latents. The chain graph over $\bm{z}$ implies ${z_1\mathrel{\hbox{$\perp$}\mkern2mu{\perp}} z_3 \mid z_2}$ and $z_j \mathrel{\hbox{$\perp$}\mkern2mu{\perp}} a_{k} \mid z_{j-1}$ for $k\neq j$. Yet, due to measurement error, ${y_1\not\mathrel{\hbox{$\perp$}\mkern2mu{\perp}} y_3 \mid y_2}$ and $y_j \not\mathrel{\hbox{$\perp$}\mkern2mu{\perp}} a_l \mid {\bm{y}}_{-j}$ for $l\leq k$. This illustrates why $\nabla \log f_{\bm{a}}(\bm{y})$ is generally dense.
  • Figure 3: Empirical performance of CRL $f$-modeling.(Left column:) relative MSE (top) and Frobenius error of $\widehat{\bm{A}}$ (bottom). (Middle column:) scaling of relative MSE with $d_Z$ (top) and with $N_e$ (bottom) across $20$ runs, with $d_X = 100$. Solid curves show the median across runs and shaded bands show the interquartile range. (Right column:) per-environment relative MSE in the oracle setting (top; known $\bm{A}^\star,\sigma^{2\star}$) and in the learned setting (bottom; unknown $\bm{A}^\star,\sigma^{2\star}$).

Theorems & Definitions (7)

  • Definition 3.2: Acyclic SCM
  • Definition 3.4: Interventions
  • Theorem 5.1
  • Theorem B.1
  • Definition E.1: Element-identifiability / Disentanglement
  • Definition E.2: Scale-permutation-identifiability
  • Definition E.3: Mixing-identifiablity