Double Variable Importance Matching to Estimate Distinct Causal Effects on Event Probability and Timing
Yuqi Li, Quinn Lanners, Matthew M. Engelhard
TL;DR
This paper introduces Double Variable Importance Matching (MCM) to estimate distinct causal effects on event probability (cure) and timing within time-to-event data that include a cured subpopulation. By fitting a mixture cure model, it derives two separate covariate-weighted distance metrics to form targeted matched groups and applies Kaplan–Meier estimates within those groups to obtain state-level estimands for cure probability ($\pi(x)$) and conditional mean event time ($\Delta(x)$). The approach provides consistency guarantees and characterizes the optimal weighting under an equal-scale constraint, with theoretical decomposition of estimation error and a demonstration of robustness through simulations and a real-world ALL transplantation study. The results indicate improved interpretability and robustness over standard matching and Cox methods, suggesting meaningful clinical insights and potential for broader application in time-to-event causal inference with cured subpopulations.
Abstract
In many clinical contexts, estimating effects of treatment in time-to-event data is complicated not only by confounding, censoring, and heterogeneity, but also by the presence of a cured subpopulation in which the event of interest never occurs. In such settings, treatment may have distinct effects on (1) the probability of being cured and (2) the event timing among non-cured individuals. Standard survival analysis and causal inference methods typically do not separate cured from non-cured individuals, obscuring distinct treatment mechanisms on cure probability and event timing. To address these challenges, we propose a matching-based framework that constructs distinct match groups to estimate heterogeneous treatment effects (HTE) on cure probability and event timing, respectively. We use mixture cure models to identify feature importance for both estimands, which in turn informs weighted distance metrics for matching in high-dimensional spaces. Within matched groups, Kaplan-Meier estimators provide estimates of cure probability and expected time to event, from which individual-level treatment effects are derived. We provide theoretical guarantees for estimator consistency and distance metric optimality under an equal-scale constraint. We further decompose estimation error into contributions from censoring, model fitting, and irreducible noise. Simulations and real-world data analyses demonstrate that our approach delivers interpretable and robust HTE estimates in time-to-event settings.
