Counterfactual Generative Models for Time-Varying Treatments

Shenghao Wu; Wenbin Zhou; Minshuo Chen; Shixiang Zhu

Counterfactual Generative Models for Time-Varying Treatments

Shenghao Wu, Wenbin Zhou, Minshuo Chen, Shixiang Zhu

TL;DR

This work tackles the challenge of estimating high-dimensional counterfactual outcomes under time-varying treatments by introducing a conditional generative framework that bypasses explicit density estimation. By leveraging inverse probability of treatment weighting (IPTW) within marginal structural generative models, the authors train flexible generators (diffusion and CVAE) to produce samples from a proxy conditional distribution that approximates the true counterfactual distribution $f_{\overline{a}}$. The framework demonstrates superior performance to several baselines on fully synthetic, semi-synthetic, and real COVID-19 data, particularly in capturing distributional features beyond the mean and in high-dimensional settings (e.g., $m=784$ TV-MNIST). The method enables uncertainty quantification and policy analysis by revealing multi-modal and region-specific counterfactual outcomes, with practical implications for public health decisions under time-varying interventions. Future work includes extending to continuous treatments, addressing potential violations of positivity or unmeasured confounding, and incorporating more advanced generative models to further improve sample fidelity.

Abstract

Estimating the counterfactual outcome of treatment is essential for decision-making in public health and clinical science, among others. Often, treatments are administered in a sequential, time-varying manner, leading to an exponentially increased number of possible counterfactual outcomes. Furthermore, in modern applications, the outcomes are high-dimensional and conventional average treatment effect estimation fails to capture disparities in individuals. To tackle these challenges, we propose a novel conditional generative framework capable of producing counterfactual samples under time-varying treatment, without the need for explicit density estimation. Our method carefully addresses the distribution mismatch between the observed and counterfactual distributions via a loss function based on inverse probability re-weighting, and supports integration with state-of-the-art conditional generative models such as the guided diffusion and conditional variational autoencoder. We present a thorough evaluation of our method using both synthetic and real-world data. Our results demonstrate that our method is capable of generating high-quality counterfactual samples and outperforms the state-of-the-art baselines.

Counterfactual Generative Models for Time-Varying Treatments

TL;DR

. The framework demonstrates superior performance to several baselines on fully synthetic, semi-synthetic, and real COVID-19 data, particularly in capturing distributional features beyond the mean and in high-dimensional settings (e.g.,

TV-MNIST). The method enables uncertainty quantification and policy analysis by revealing multi-modal and region-specific counterfactual outcomes, with practical implications for public health decisions under time-varying interventions. Future work includes extending to continuous treatments, addressing potential violations of positivity or unmeasured confounding, and incorporating more advanced generative models to further improve sample fidelity.

Abstract

Paper Structure (56 sections, 2 theorems, 40 equations, 11 figures, 8 tables, 2 algorithms)

This paper contains 56 sections, 2 theorems, 40 equations, 11 figures, 8 tables, 2 algorithms.

Introduction
Related work
Causal inference with time-varying treatments
Counterfactual distribution estimation
Counterfactual generative model
Methodology
Problem setup
Counterfactual generative framework for time-varying treatments
Marginal structural generative models
Classifier-free guided diffusion model
Conditional variational autoencoder
Experiments
Experiment set-up
Fully synthetic data
Semi-synthetic data
...and 41 more sections

Key Result

Lemma 1

Under unconfoundedness and positivity, we have where $f(y,\overline{a},\overline{x})$ denotes the joint distribution and $f\left(a_{\tau}|\overline{a}_{\tau-1},\overline{x}_{\tau}\right)$ denotes the propensity score at $\tau$.

Figures (11)

Figure 1: Challenges in estimating the counterfactual outcomes of time-varying treatments. Left: The mean is incapable of describing the heterogeneous effect in counterfactual distributions. Middle: In a realistic scenario where the treatment is the state-level mask mandate, the outcome is a $67$-dimensional vector, corresponding to the number of COVID-19 cases of the $67$ counties in Pennsylvania. Right: The longer the dependence on the treatment history, the greater the distributional mismatch tends to be. Here $d$ denotes the length of the history dependence.
Figure 2: The causal directed acyclic graph (DAG) of the time-varying treatment.
Figure 3: The architecture of the proposed counterfactual generative models. The generator $g_\theta$ is designed to produce samples of the outcome variable $Y(\overline{a})$ with a given time-varying treatment $\overline{a}$. The generated samples are expected to conform to the proxy conditional distribution $f_{\theta}$, which is an approximate of the underlying counterfactual distribution $f_{\overline{a}}$.
Figure 4: An illustration of our learning objective. We aim to minimize the KL-divergence between the proxy distribution $f_\theta(\cdot|\overline{a})$ and the true counterfactual distribution $f_{\overline{a}}$.
Figure 5: The estimated and true counterfactual distributions across various lengths of history dependence ($d=1,3,5$) on the fully synthetic datasets ($m=1$). Each sub-panel provides a comparison for a specific treatment combination $\overline{a}$.
...and 6 more figures

Theorems & Definitions (3)

Lemma 1
Proposition 1
Remark 1

Counterfactual Generative Models for Time-Varying Treatments

TL;DR

Abstract

Counterfactual Generative Models for Time-Varying Treatments

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (3)