Table of Contents
Fetching ...

CAETC: Causal Autoencoding and Treatment Conditioning for Counterfactual Estimation over Time

Nghia D. Nguyen, Pablo Robles-Granda, Lav R. Varshney

Abstract

Counterfactual estimation over time is important in various applications, such as personalized medicine. However, time-dependent confounding bias in observational data still poses a significant challenge in achieving accurate and efficient estimation. We introduce causal autoencoding and treatment conditioning (CAETC), a novel method for this problem. Built on adversarial representation learning, our method leverages an autoencoding architecture to learn a partially invertible and treatment-invariant representation, where the outcome prediction task is cast as applying a treatment-specific conditioning on the representation. Our design is independent of the underlying sequence model and can be applied to existing architectures such as long short-term memories (LSTMs) or temporal convolution networks (TCNs). We conduct extensive experiments on synthetic, semi-synthetic, and real-world data to demonstrate that CAETC yields significant improvement in counterfactual estimation over existing methods.

CAETC: Causal Autoencoding and Treatment Conditioning for Counterfactual Estimation over Time

Abstract

Counterfactual estimation over time is important in various applications, such as personalized medicine. However, time-dependent confounding bias in observational data still poses a significant challenge in achieving accurate and efficient estimation. We introduce causal autoencoding and treatment conditioning (CAETC), a novel method for this problem. Built on adversarial representation learning, our method leverages an autoencoding architecture to learn a partially invertible and treatment-invariant representation, where the outcome prediction task is cast as applying a treatment-specific conditioning on the representation. Our design is independent of the underlying sequence model and can be applied to existing architectures such as long short-term memories (LSTMs) or temporal convolution networks (TCNs). We conduct extensive experiments on synthetic, semi-synthetic, and real-world data to demonstrate that CAETC yields significant improvement in counterfactual estimation over existing methods.
Paper Structure (42 sections, 7 theorems, 47 equations, 5 figures, 8 tables, 1 algorithm)

This paper contains 42 sections, 7 theorems, 47 equations, 5 figures, 8 tables, 1 algorithm.

Key Result

Theorem 4.1

For a fixed $t$, there exists a pair $\Phi$ and $F^B$ that satisfies the equilibrium in eq:adv-game. The equilibrium holds if and only if $\Phi$ satisfies eq:treatment-invariant.

Figures (5)

  • Figure 1: Causal graph for time-dependent confounding over $\mathcal{H}_t$.
  • Figure 2: A. Prior works concatenate the representation $\Phi(\mathcal{H}_t)$ with treatment $A_{t+1}$ to predict next outcomes at $F^Y$. We make a distinction to model treatment $A_{t+1}$ as a transformation on the representation before being forwarded to respective heads. B. The history $\mathcal{H}_t$ is encoded into representation $\Phi(\mathcal{H}_t)$ before being forwarded to respective heads for decoding. The treatment, outcome, and time-varying covariates estimators $F^A$, $F^Y$ and $F^X$ reconstruct $A_{t}$, $\bm{Y}_t$ and $\bm{X}_t$. C. Simultaneously, balancing is applied to $\Phi(\mathcal{H}_t)$ by maximizing the entropy of the treatment balancer $F^B$. D.$A_{t+1}$-specific transformation is applied to the $\Phi(\mathcal{H}_t)$ before being decoded to the next outcomes by $F^Y$. E. The conditioning layer $F^C$ is encouraged to learn treatment-specific transformations of $\Phi(\mathcal{H}_t)$.
  • Figure 3: To handle input mismatch of the history $\mathcal{H}_{T_0}=\{\bm{V}, A_{t}, \bm{Y}_{t}, \bm{X}_{t}\}_{t=1}^{T_0}$ and autoregressively decoded sequences $\{\bm{V}, A_{t}, \bm{Y}_{t}\}_{t=T_0 +1}^{T_0+\tau}$, we replace the future time-varying covariates $\{\bm{X}\}_{T_0+1}^{T_0+\tau}$ with a learnable vector $\bm{M}$.
  • Figure 4: RMSEs for NSCLC fully synthetic data on random trajectories with increasing levels of time-dependent confounding on the training dataset. Each bar represents the RMSE for 5-step-ahead predictions. Lower is better.
  • Figure 5: RMSEs for NSCLC fully synthetic data on no-confounding test set ($\gamma=0$) with increasing levels of time-dependent confounding on the training dataset. Training on $\gamma=0$ and testing on $\gamma=0$ represents the performance upper bound for each method. Due to time-dependent confounding, training on $\gamma \neq 0$ and testing on $\gamma = 0$ is expected to reduce performance. Lower is better.

Theorems & Definitions (12)

  • Theorem 4.1
  • Theorem 4.2
  • Lemma 2.1
  • proof
  • Theorem 2.1
  • proof
  • Lemma 3.1
  • proof
  • Lemma 3.2
  • proof
  • ...and 2 more