Table of Contents
Fetching ...

Coupling Generative Modeling and an Autoencoder with the Causal Bridge

Ruolin Meng, Ming-Yu Chung, Dhanajit Brahma, Ricardo Henao, Lawrence Carin

TL;DR

The paper tackles estimating causal effects under unobserved confounding by leveraging two sets of proxies via a causal bridge function. It advances the field by coupling a generative latent-variable model with an autoencoder that shares information across observed proxies and outcomes, and by deriving an average-error bound that depends on information-theoretic quantities. The authors extend the causal-bridge framework to survival outcomes and validate the method on synthetic datasets and real-world Framingham data, showing improvements over state-of-the-art proxy-based methods and alignment with randomized trials. This work enhances practical causal inference in the presence of proxy measurements, offering a principled, learnable approach that integrates generative modeling with causal identification principles.

Abstract

We consider inferring the causal effect of a treatment (intervention) on an outcome of interest in situations where there is potentially an unobserved confounder influencing both the treatment and the outcome. This is achievable by assuming access to two separate sets of control (proxy) measurements associated with treatment and outcomes, which are used to estimate treatment effects through a function termed the em causal bridge (CB). We present a new theoretical perspective, associated assumptions for when estimating treatment effects with the CB is feasible, and a bound on the average error of the treatment effect when the CB assumptions are violated. From this new perspective, we then demonstrate how coupling the CB with an autoencoder architecture allows for the sharing of statistical strength between observed quantities (proxies, treatment, and outcomes), thus improving the quality of the CB estimates. Experiments on synthetic and real-world data demonstrate the effectiveness of the proposed approach in relation to the state-of-the-art methodology for proxy measurements.

Coupling Generative Modeling and an Autoencoder with the Causal Bridge

TL;DR

The paper tackles estimating causal effects under unobserved confounding by leveraging two sets of proxies via a causal bridge function. It advances the field by coupling a generative latent-variable model with an autoencoder that shares information across observed proxies and outcomes, and by deriving an average-error bound that depends on information-theoretic quantities. The authors extend the causal-bridge framework to survival outcomes and validate the method on synthetic datasets and real-world Framingham data, showing improvements over state-of-the-art proxy-based methods and alignment with randomized trials. This work enhances practical causal inference in the presence of proxy measurements, offering a principled, learnable approach that integrates generative modeling with causal identification principles.

Abstract

We consider inferring the causal effect of a treatment (intervention) on an outcome of interest in situations where there is potentially an unobserved confounder influencing both the treatment and the outcome. This is achievable by assuming access to two separate sets of control (proxy) measurements associated with treatment and outcomes, which are used to estimate treatment effects through a function termed the em causal bridge (CB). We present a new theoretical perspective, associated assumptions for when estimating treatment effects with the CB is feasible, and a bound on the average error of the treatment effect when the CB assumptions are violated. From this new perspective, we then demonstrate how coupling the CB with an autoencoder architecture allows for the sharing of statistical strength between observed quantities (proxies, treatment, and outcomes), thus improving the quality of the CB estimates. Experiments on synthetic and real-world data demonstrate the effectiveness of the proposed approach in relation to the state-of-the-art methodology for proxy measurements.

Paper Structure

This paper contains 30 sections, 69 equations, 16 figures, 3 tables.

Figures (16)

  • Figure 1: Graphical model for the causal-inference problem. $U$ is the unobserved confounder, $X$ is the treatment, $Y$ is the outcome of interest, and $Z$ and $W$ are the treatment and outcome controls, respectively. The dashed lines represent dependencies that may or may not be present.
  • Figure 2: Relative error ($r(\eta)$) vs. mutual information ($I(U ; Z | W,x)$) both averaged over $X$. Each line represents a value of $\sigma_Z$ for increasing values of $\sigma_W=\{0.1,0.25,0.5,0.75,1\}$, $\sigma_X=0.1$, which are consistent with $I(U ; Z | W,x)$.
  • Figure 3: Out-of-sample MSE results for (Left) Demand and (Middle) dSprite data. (Right) Hazard-ratio (HR) results with 95%CIs for Framingham data. The different methods are listed along the x axis, including results from the RCT, to which CB + AE agrees best. The red and green dashed lines correspond to the null HR$=1$ and the reference (mean RCT estimate), respectively.
  • Figure 5: Heatmap of Mean of $\frac{\eta}{|\mathbb{E}[Y|x,Z]|}$
  • Figure 6: Heatmap of Standard Deviation of $\frac{\eta}{|\mathbb{E}[Y|x,Z]|}$
  • ...and 11 more figures