CAETC: Causal Autoencoding and Treatment Conditioning for Counterfactual Estimation over Time

Nghia D. Nguyen; Pablo Robles-Granda; Lav R. Varshney

CAETC: Causal Autoencoding and Treatment Conditioning for Counterfactual Estimation over Time

Nghia D. Nguyen, Pablo Robles-Granda, Lav R. Varshney

Abstract

Counterfactual estimation over time is important in various applications, such as personalized medicine. However, time-dependent confounding bias in observational data still poses a significant challenge in achieving accurate and efficient estimation. We introduce causal autoencoding and treatment conditioning (CAETC), a novel method for this problem. Built on adversarial representation learning, our method leverages an autoencoding architecture to learn a partially invertible and treatment-invariant representation, where the outcome prediction task is cast as applying a treatment-specific conditioning on the representation. Our design is independent of the underlying sequence model and can be applied to existing architectures such as long short-term memories (LSTMs) or temporal convolution networks (TCNs). We conduct extensive experiments on synthetic, semi-synthetic, and real-world data to demonstrate that CAETC yields significant improvement in counterfactual estimation over existing methods.

CAETC: Causal Autoencoding and Treatment Conditioning for Counterfactual Estimation over Time

Abstract

Paper Structure (42 sections, 7 theorems, 47 equations, 5 figures, 8 tables, 1 algorithm)

This paper contains 42 sections, 7 theorems, 47 equations, 5 figures, 8 tables, 1 algorithm.

Introduction
Related Works
Counterfactual estimation over time
Learning treatment-invariant representation
Problem Formulation
Outcome forecasting
Method
Architecture overview
Input autoencoding
Outcomes prediction via treatment conditioning
Treatment-invariant representation learning
Treatment-specific conditioning
Autoregressive decoding
Overall training objective
Experiments
...and 27 more sections

Key Result

Theorem 4.1

For a fixed $t$, there exists a pair $\Phi$ and $F^B$ that satisfies the equilibrium in eq:adv-game. The equilibrium holds if and only if $\Phi$ satisfies eq:treatment-invariant.

Figures (5)

Figure 1: Causal graph for time-dependent confounding over $\mathcal{H}_t$.
Figure 2: A. Prior works concatenate the representation $\Phi(\mathcal{H}_t)$ with treatment $A_{t+1}$ to predict next outcomes at $F^Y$. We make a distinction to model treatment $A_{t+1}$ as a transformation on the representation before being forwarded to respective heads. B. The history $\mathcal{H}_t$ is encoded into representation $\Phi(\mathcal{H}_t)$ before being forwarded to respective heads for decoding. The treatment, outcome, and time-varying covariates estimators $F^A$, $F^Y$ and $F^X$ reconstruct $A_{t}$, $\bm{Y}_t$ and $\bm{X}_t$. C. Simultaneously, balancing is applied to $\Phi(\mathcal{H}_t)$ by maximizing the entropy of the treatment balancer $F^B$. D.$A_{t+1}$-specific transformation is applied to the $\Phi(\mathcal{H}_t)$ before being decoded to the next outcomes by $F^Y$. E. The conditioning layer $F^C$ is encouraged to learn treatment-specific transformations of $\Phi(\mathcal{H}_t)$.
Figure 3: To handle input mismatch of the history $\mathcal{H}_{T_0}=\{\bm{V}, A_{t}, \bm{Y}_{t}, \bm{X}_{t}\}_{t=1}^{T_0}$ and autoregressively decoded sequences $\{\bm{V}, A_{t}, \bm{Y}_{t}\}_{t=T_0 +1}^{T_0+\tau}$, we replace the future time-varying covariates $\{\bm{X}\}_{T_0+1}^{T_0+\tau}$ with a learnable vector $\bm{M}$.
Figure 4: RMSEs for NSCLC fully synthetic data on random trajectories with increasing levels of time-dependent confounding on the training dataset. Each bar represents the RMSE for 5-step-ahead predictions. Lower is better.
Figure 5: RMSEs for NSCLC fully synthetic data on no-confounding test set ($\gamma=0$) with increasing levels of time-dependent confounding on the training dataset. Training on $\gamma=0$ and testing on $\gamma=0$ represents the performance upper bound for each method. Due to time-dependent confounding, training on $\gamma \neq 0$ and testing on $\gamma = 0$ is expected to reduce performance. Lower is better.

Theorems & Definitions (12)

Theorem 4.1
Theorem 4.2
Lemma 2.1
proof
Theorem 2.1
proof
Lemma 3.1
proof
Lemma 3.2
proof
...and 2 more

CAETC: Causal Autoencoding and Treatment Conditioning for Counterfactual Estimation over Time

Abstract

CAETC: Causal Autoencoding and Treatment Conditioning for Counterfactual Estimation over Time

Authors

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (12)