Deep Doubly Debiased Longitudinal Effect Estimation with ICE G-Computation

Wenxin Chen; Weishen Pan; Kyra Gan; Fei Wang

Deep Doubly Debiased Longitudinal Effect Estimation with ICE G-Computation

Wenxin Chen, Weishen Pan, Kyra Gan, Fei Wang

TL;DR

This work tackles the difficulty of estimating longitudinal treatment effects under time-varying confounding and treatment–confounder feedback by introducing D3-Net, a two-stage debiasing framework. During training, SDR-based targets are used to stabilize the recursive ICE G-computation with a multi-task Transformer that also includes a covariate-simulator head and a target network. For inference, the model discards SDR corrections and applies LTMLE to the original nuisance models, achieving robust finite-sample properties. Across semi-synthetic MIMIC-based experiments, D3-Net consistently reduces bias and variance across horizons and confounding regimes, with ablations confirming the central role of SDR-based training and the stabilizing benefit of LTMLE re-debiasing. The approach supports robust evaluation of longitudinal policies and could improve decision-making in sequential treatment settings while highlighting the importance of combining debiasing stages in deep causal estimators.

Abstract

Estimating longitudinal treatment effects is essential for sequential decision-making but is challenging due to treatment-confounder feedback. While Iterative Conditional Expectation (ICE) G-computation offers a principled approach, its recursive structure suffers from error propagation, corrupting the learned outcome regression models. We propose D3-Net, a framework that mitigates error propagation in ICE training and then applies a robust final correction. First, to interrupt error propagation during learning, we train the ICE sequence using Sequential Doubly Robust (SDR) pseudo-outcomes, which provide bias-corrected targets for each regression. Second, we employ a multi-task Transformer with a covariate simulator head for auxiliary supervision, regularizing representations against corruption by noisy pseudo-outcomes, and a target network to stabilize training dynamics. For the final estimate, we discard the SDR correction and instead use the uncorrected nuisance models to perform Longitudinal Targeted Minimum Loss-Based Estimation (LTMLE) on the original outcomes. This second-stage, targeted debiasing ensures robustness and optimal finite-sample properties. Comprehensive experiments demonstrate that our model, D3-Net, robustly reduces bias and variance across different horizons, counterfactuals, and time-varying confoundings, compared to existing state-of-the-art ICE-based estimators.

Deep Doubly Debiased Longitudinal Effect Estimation with ICE G-Computation

TL;DR

Abstract

Paper Structure (37 sections, 4 theorems, 15 equations, 6 figures, 4 tables, 3 algorithms)

This paper contains 37 sections, 4 theorems, 15 equations, 6 figures, 4 tables, 3 algorithms.

Introduction
Related work
Longitudinal Causal Estimation
Off-Policy Evaluation in RL
Deep Learning for Sequential Estimation
Problem Setup and Preliminary
Problem Setup
ICE G-computation
Semi-parametric Efficiency
LTMLE
SDR
Methods
Debias During Traning via SDR
Auxiliary Supervision
Re-debias via LTMLE
...and 22 more sections

Key Result

Lemma 2

Assume that there is a time $k$ such that$\|\hat{Q}_t - Q_t^0\| = o_p(1)$ for all $t>k$ and $\|\hat{G}_t - G^{0}_t\| = o_p(1)$ for all $t\leq k$. Then we have $\hat{\psi}^d = \psi^d + o_p(1)$.

Figures (6)

Figure 1: Overview of the $D^3$-Net architecture and training procedure. $D^3$-Net uses a shared multi-task Transformer backbone with the outcome ($Q$), treatment ($G$), and simulator ($S$) heads. At each time step $t$, the $Q$-head is trained using regression targets constructed from the SDR transformation, which combines future outcome and treatment models to form bias-corrected pseudo-outcomes $Q_{t+1}^\dagger(a_{t+1}, H_{t+1})$. A target network, implemented as a delayed copy of the main network, is used to generate stable SDR targets for recursive learning.
Figure 2: Absolute bias of CAPO estimates across counterfactual sequences for horizons $\tau \in {10,15,20}$ under limited (left) and expanded (right) time-varying confounding. $D^3$-Net consistently achieves lower bias and smaller dispersion.
Figure 3: Ablation study of $D^3$-Net. Left: Effect of LTMLE re-debiasing on top of SDR across horizons; the y-axis shows the change in absolute bias (LTMLE minus raw SDR), with each box corresponding to a counterfactual sequence. Right: Ablation of training components across horizons; the y-axis shows mean absolute bias $\pm$ standard deviation.
Figure 4: Distribution of CAPO estimates of 72-hour serum lactate across four MAP targets (65–80 mmHg), aggregated over 20 bootstrap experiments. Higher MAP targets are associated with modestly higher lactate levels, consistent with prior evidence and current clinical guidelines.
Figure S1: RMSE of CAPO estimates across counterfactual sequences for horizons $\tau \in {10,15,20}$ under limited (left) and expanded (right) time-varying confounding.
...and 1 more figures

Theorems & Definitions (5)

Remark 1
Lemma 2: $\tau+1$ multiply robust consistency of LTMLE diaz2023nonparametric
Lemma 3: $2^\tau$ multiply robust consistency diaz2023nonparametric
Lemma 3: First- vs. second-order dependence in training targets
Lemma 3: First- vs. second-order dependence in training targets

Deep Doubly Debiased Longitudinal Effect Estimation with ICE G-Computation

TL;DR

Abstract

Deep Doubly Debiased Longitudinal Effect Estimation with ICE G-Computation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (5)