Double Variable Importance Matching to Estimate Distinct Causal Effects on Event Probability and Timing

Yuqi Li; Quinn Lanners; Matthew M. Engelhard

Double Variable Importance Matching to Estimate Distinct Causal Effects on Event Probability and Timing

Yuqi Li, Quinn Lanners, Matthew M. Engelhard

TL;DR

This paper introduces Double Variable Importance Matching (MCM) to estimate distinct causal effects on event probability (cure) and timing within time-to-event data that include a cured subpopulation. By fitting a mixture cure model, it derives two separate covariate-weighted distance metrics to form targeted matched groups and applies Kaplan–Meier estimates within those groups to obtain state-level estimands for cure probability ($\pi(x)$) and conditional mean event time ($\Delta(x)$). The approach provides consistency guarantees and characterizes the optimal weighting under an equal-scale constraint, with theoretical decomposition of estimation error and a demonstration of robustness through simulations and a real-world ALL transplantation study. The results indicate improved interpretability and robustness over standard matching and Cox methods, suggesting meaningful clinical insights and potential for broader application in time-to-event causal inference with cured subpopulations.

Abstract

In many clinical contexts, estimating effects of treatment in time-to-event data is complicated not only by confounding, censoring, and heterogeneity, but also by the presence of a cured subpopulation in which the event of interest never occurs. In such settings, treatment may have distinct effects on (1) the probability of being cured and (2) the event timing among non-cured individuals. Standard survival analysis and causal inference methods typically do not separate cured from non-cured individuals, obscuring distinct treatment mechanisms on cure probability and event timing. To address these challenges, we propose a matching-based framework that constructs distinct match groups to estimate heterogeneous treatment effects (HTE) on cure probability and event timing, respectively. We use mixture cure models to identify feature importance for both estimands, which in turn informs weighted distance metrics for matching in high-dimensional spaces. Within matched groups, Kaplan-Meier estimators provide estimates of cure probability and expected time to event, from which individual-level treatment effects are derived. We provide theoretical guarantees for estimator consistency and distance metric optimality under an equal-scale constraint. We further decompose estimation error into contributions from censoring, model fitting, and irreducible noise. Simulations and real-world data analyses demonstrate that our approach delivers interpretable and robust HTE estimates in time-to-event settings.

Double Variable Importance Matching to Estimate Distinct Causal Effects on Event Probability and Timing

TL;DR

) and conditional mean event time (

). The approach provides consistency guarantees and characterizes the optimal weighting under an equal-scale constraint, with theoretical decomposition of estimation error and a demonstration of robustness through simulations and a real-world ALL transplantation study. The results indicate improved interpretability and robustness over standard matching and Cox methods, suggesting meaningful clinical insights and potential for broader application in time-to-event causal inference with cured subpopulations.

Abstract

Paper Structure (39 sections, 2 theorems, 84 equations, 3 figures, 3 tables, 1 algorithm)

This paper contains 39 sections, 2 theorems, 84 equations, 3 figures, 3 tables, 1 algorithm.

Introduction
Literature Review
Matching for Causal Inference with Time-to-Event Data
Survival Analysis with a Cured Subpopulation
Setup and Estimands
Assumptions and Identification
Mixture Cure Model
Methodology
Fitting Mixture Cure Models
Learning Distance Metrics
Matching and HTE Estimation
Theoretical Results
Consistency of Matching Estimation
Optimal Distance Metric
Setup.
...and 24 more sections

Key Result

Proposition 1

If the consistency, unconfoundedness, positivity and non-informative censoring assumptions hold, then the two estimands can be expressed as

Figures (3)

Figure 1: Hypothetical Survival Curves Where Treatment Increases the Cure Probability yet Reduces the Conditional Mean Event Time.
Figure 2: Absolute HTE Estimation Error on Cure Probability.
Figure 3: Distribution of Estimated HTEs on Cure Probability under Leukemia-Free Survival (LFS).

Theorems & Definitions (2)

Proposition 1
Proposition 2

Double Variable Importance Matching to Estimate Distinct Causal Effects on Event Probability and Timing

TL;DR

Abstract

Double Variable Importance Matching to Estimate Distinct Causal Effects on Event Probability and Timing

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (2)