Table of Contents
Fetching ...

Disentangled Graph Autoencoder for Treatment Effect Estimation

Di Fan, Renlei Jiang, Yunhao Wen, Chuanhou Gao

TL;DR

This work tackles the challenge of estimating individual treatment effects (ITE) from networked observational data in the presence of latent confounders. It introduces TNDVGA, a disentangled variational graph autoencoder that partitions latent information into four categories: instrumental ($\mathbf{z}_t$), confounding ($\mathbf{z}_c$), adjustment ($\mathbf{z}_y$), and noise ($\mathbf{z}_o$), and enforces their independence via HSIC. The model jointly optimizes an ELBO-based objective plus specialized losses for treatment and outcome prediction, independence regularization, and balanced representation, enabling accurate ITE estimation even when traditional unconfoundedness fails. Extensive experiments on synthetic and semi-synthetic networked datasets demonstrate state-of-the-art performance, highlighting the practical impact of disentangled latent-factor learning combined with network structure for counterfactual inference in domains such as healthcare and policy.

Abstract

Treatment effect estimation from observational data has attracted significant attention across various research fields. However, many widely used methods rely on the unconfoundedness assumption, which is often unrealistic due to the inability to observe all confounders, thereby overlooking the influence of latent confounders. To address this limitation, recent approaches have utilized auxiliary network information to infer latent confounders, relaxing this assumption. However, these methods often treat observed variables and networks as proxies only for latent confounders, which can result in inaccuracies when certain variables influence treatment without affecting outcomes, or vice versa. This conflation of distinct latent factors undermines the precision of treatment effect estimation. To overcome this challenge, we propose a novel disentangled variational graph autoencoder for treatment effect estimation on networked observational data. Our graph encoder disentangles latent factors into instrumental, confounding, adjustment, and noisy factors, while enforcing factor independence using the Hilbert-Schmidt Independence Criterion. Extensive experiments on multiple networked datasets demonstrate that our method outperforms state-of-the-art approaches.

Disentangled Graph Autoencoder for Treatment Effect Estimation

TL;DR

This work tackles the challenge of estimating individual treatment effects (ITE) from networked observational data in the presence of latent confounders. It introduces TNDVGA, a disentangled variational graph autoencoder that partitions latent information into four categories: instrumental (), confounding (), adjustment (), and noise (), and enforces their independence via HSIC. The model jointly optimizes an ELBO-based objective plus specialized losses for treatment and outcome prediction, independence regularization, and balanced representation, enabling accurate ITE estimation even when traditional unconfoundedness fails. Extensive experiments on synthetic and semi-synthetic networked datasets demonstrate state-of-the-art performance, highlighting the practical impact of disentangled latent-factor learning combined with network structure for counterfactual inference in domains such as healthcare and policy.

Abstract

Treatment effect estimation from observational data has attracted significant attention across various research fields. However, many widely used methods rely on the unconfoundedness assumption, which is often unrealistic due to the inability to observe all confounders, thereby overlooking the influence of latent confounders. To address this limitation, recent approaches have utilized auxiliary network information to infer latent confounders, relaxing this assumption. However, these methods often treat observed variables and networks as proxies only for latent confounders, which can result in inaccuracies when certain variables influence treatment without affecting outcomes, or vice versa. This conflation of distinct latent factors undermines the precision of treatment effect estimation. To overcome this challenge, we propose a novel disentangled variational graph autoencoder for treatment effect estimation on networked observational data. Our graph encoder disentangles latent factors into instrumental, confounding, adjustment, and noisy factors, while enforcing factor independence using the Hilbert-Schmidt Independence Criterion. Extensive experiments on multiple networked datasets demonstrate that our method outperforms state-of-the-art approaches.

Paper Structure

This paper contains 33 sections, 1 theorem, 23 equations, 6 figures, 5 tables.

Key Result

theorem 1

If we recover $p(\mathbf{z}_c,\mathbf{z}_y\mid\mathbf{x},\mathbf{A})$ and $p(y\mid t,\mathbf{z}_c,\mathbf{z}_y)$, then the proposed TNDVGA can recover the individual treatment effect from networked observational data.

Figures (6)

  • Figure 1: The causal diagram of the proposed TNDVGA. $\mathbf{x}$ represents the observed variables, $\mathbf{A}$ denotes the network structure, $t$ is the treatment, $y$ is the outcome, $\mathbf{z}_t$ is latent instrument factors affecting only the treatment, $\mathbf{z}_c$ is latent confounding factors, $\mathbf{z}_y$ is latent adjustment factors affecting only the outcome, and $\mathbf{z}_o$ is the latent noise factors unrelated to both treatment and outcome.
  • Figure 2: The overall architecture of TNDVGA consists of a generative network and an inference network for disentangling latent factors.
  • Figure 3: Experimental results of different methods in ITE estimation under different levels of selection bias. As the selection bias increases, TNDVGA consistently performs the best.
  • Figure 4: In the radar chart, each vertex of the polygon is labeled with a sequence of latent factor dimensions from the synthetic dataset. For example, $8\text{-}8\text{-}8\text{-}8$ indicates that the dataset is generated using 8 dimensions each for latent instrumental factors, latent confounding factors, latent adjustment factors, and latent noise factors. Each polygon represents the PEHE metric of the model (smaller polygons indicate better performance).
  • Figure 5: Hyperparameter analysis on BlogCatalog across different $\kappa_2$.
  • ...and 1 more figures

Theorems & Definitions (2)

  • definition 1: Learning ITEs from Networked Observational Data
  • theorem 1: Identifiability of ITE