Table of Contents
Fetching ...

Synthetic Survival Control: Extending Synthetic Controls for "When-If" Decision

Jessy Xinyi Han, Devavrat Shah

TL;DR

This work tackles causal inference for time-to-event outcomes under censoring in observational panel data. It introduces Synthetic Survival Control (SSC), a low-rank panel survival framework that constructs counterfactual hazard trajectories as weighted averages of observed donor units' trajectories. The paper establishes identification and finite-sample guarantees under both no unobserved confounding and latent-factor confounding, and validates SSC through synthetic simulations and a TCL clinical application, showing that access to novel therapies reduces post-intervention hazards. The methodology integrates time-to-event modeling with causal inference, enabling interpretable counterfactual survival estimates in medicine, economics, and public policy. The results suggest SSC as a practical tool for evaluating interventions when randomized experiments are infeasible, with robust performance demonstrated via held-out validation and bootstrap uncertainty quantification.

Abstract

Estimating causal effects on time-to-event outcomes from observational data is particularly challenging due to censoring, limited sample sizes, and non-random treatment assignment. The need for answering such "when-if" questions--how the timing of an event would change under a specified intervention--commonly arises in real-world settings with heterogeneous treatment adoption and confounding. To address these challenges, we propose Synthetic Survival Control (SSC) to estimate counterfactual hazard trajectories in a panel data setting where multiple units experience potentially different treatments over multiple periods. In such a setting, SSC estimates the counterfactual hazard trajectory for a unit of interest as a weighted combination of the observed trajectories from other units. To provide formal justification, we introduce a panel framework with a low-rank structure for causal survival analysis. Indeed, such a structure naturally arises under classical parametric survival models. Within this framework, for the causal estimand of interest, we establish identification and finite sample guarantees for SSC. We validate our approach using a multi-country clinical dataset of cancer treatment outcomes, where the staggered introduction of new therapies creates a quasi-experimental setting. Empirically, we find that access to novel treatments is associated with improved survival, as reflected by lower post-intervention hazard trajectories relative to their synthetic counterparts. Given the broad relevance of survival analysis across medicine, economics, and public policy, our framework offers a general and interpretable tool for counterfactual survival inference using observational data.

Synthetic Survival Control: Extending Synthetic Controls for "When-If" Decision

TL;DR

This work tackles causal inference for time-to-event outcomes under censoring in observational panel data. It introduces Synthetic Survival Control (SSC), a low-rank panel survival framework that constructs counterfactual hazard trajectories as weighted averages of observed donor units' trajectories. The paper establishes identification and finite-sample guarantees under both no unobserved confounding and latent-factor confounding, and validates SSC through synthetic simulations and a TCL clinical application, showing that access to novel therapies reduces post-intervention hazards. The methodology integrates time-to-event modeling with causal inference, enabling interpretable counterfactual survival estimates in medicine, economics, and public policy. The results suggest SSC as a practical tool for evaluating interventions when randomized experiments are infeasible, with robust performance demonstrated via held-out validation and bootstrap uncertainty quantification.

Abstract

Estimating causal effects on time-to-event outcomes from observational data is particularly challenging due to censoring, limited sample sizes, and non-random treatment assignment. The need for answering such "when-if" questions--how the timing of an event would change under a specified intervention--commonly arises in real-world settings with heterogeneous treatment adoption and confounding. To address these challenges, we propose Synthetic Survival Control (SSC) to estimate counterfactual hazard trajectories in a panel data setting where multiple units experience potentially different treatments over multiple periods. In such a setting, SSC estimates the counterfactual hazard trajectory for a unit of interest as a weighted combination of the observed trajectories from other units. To provide formal justification, we introduce a panel framework with a low-rank structure for causal survival analysis. Indeed, such a structure naturally arises under classical parametric survival models. Within this framework, for the causal estimand of interest, we establish identification and finite sample guarantees for SSC. We validate our approach using a multi-country clinical dataset of cancer treatment outcomes, where the staggered introduction of new therapies creates a quasi-experimental setting. Empirically, we find that access to novel treatments is associated with improved survival, as reflected by lower post-intervention hazard trajectories relative to their synthetic counterparts. Given the broad relevance of survival analysis across medicine, economics, and public policy, our framework offers a general and interpretable tool for counterfactual survival inference using observational data.

Paper Structure

This paper contains 63 sections, 13 theorems, 136 equations, 6 figures, 2 tables.

Key Result

Proposition 1

Suppose Assumption ass:sutva-ass:positivity and no unobserved confounding, $\{\tau^{(0)}_{p, n}, \tau^{(1)}_{p, n}\} \perp D_{p, n} \mid X_{p, n}$. Then for some fixed time horizon $\widetilde{\tau}>0$, the marginal potential hazard function is identified as

Figures (6)

  • Figure 1: DAGs of panel data generating processes. Dashed arrows indicate potential confounding paths. Gray and light pink nodes are latent at the time of analysis. $D_{p, n}$: treatment assignment, $\tau_{p, n}$: event time, $C_{p, n}$: censoring time, $T_{p, n}$: observed time. Figure (a) shows the most fundamental case with fully observed covariates $X_{p, n}$; (b) shows the setup where $X_{p, n}$ are unobserved and consist of latent unit factor $V_n$ and latent period factor $U_p$. The unit $n$ ellipse contains all unit-related information while the period $p$ ellipse contains all period-related information.
  • Figure 2: Visualization of the setup. $\mathcal{I}^{(0)}$and $\mathcal{I}^{(1)}$ denotes the control group and treatment group respectively. The shaded area represents the observed data while the question mark region denotes our target.
  • Figure 3: Counterfactual survival estimation under two DGPs. Post-period control survival trajectory for the treated unit under: (i) the true DGP, (ii) SSC, and (iii) the Confounder-Aware Parametric Estimator which observes the true latent factors and fits the correctly specified parametric survival model. SSC closely approximates the true curve in both Cox and Aalen models.
  • Figure 4: SSC sup-norm errors across sample sizes. Distribution of sup-norm estimation errors for SSC under Cox and Aalen DGPs over 20 simulations each. Errors decrease rapidly with increasing sample size $K$, and variability contracts accordingly.
  • Figure 5: Pre-/Post-treatment survival function: USA factual (SA in second-line, orange), synthetic counterfactual under CC (blue), USA held-out (CC in second-line, green).
  • ...and 1 more figures

Theorems & Definitions (23)

  • Proposition 1: Identification Under No Unobserved Confounding
  • Lemma 1
  • Theorem 1
  • Theorem 2: Consistency of Counterfactual Transformed Hazard Estimation
  • proof
  • proof
  • Lemma 2: Perturbation of Singular Values
  • proof
  • Lemma 3: Operator-norm control from survival estimation error
  • proof
  • ...and 13 more