SurvHTE-Bench: A Benchmark for Heterogeneous Treatment Effect Estimation in Survival Analysis

Shahriar Noroozizadeh; Xiaobin Shen; Jeremy C. Weiss; George H. Chen

SurvHTE-Bench: A Benchmark for Heterogeneous Treatment Effect Estimation in Survival Analysis

Shahriar Noroozizadeh, Xiaobin Shen, Jeremy C. Weiss, George H. Chen

TL;DR

SurvHTE-Bench establishes a foundation for fair, reproducible, and extensible evaluation of causal survival methods under diverse conditions and realistic assumption violations, and provides the first rigorous comparison of survival HTE methods under diverse conditions and realistic assumption violations.

Abstract

Estimating heterogeneous treatment effects (HTEs) from right-censored survival data is critical in high-stakes applications such as precision medicine and individualized policy-making. Yet, the survival analysis setting poses unique challenges for HTE estimation due to censoring, unobserved counterfactuals, and complex identification assumptions. Despite recent advances, from Causal Survival Forests to survival meta-learners and outcome imputation approaches, evaluation practices remain fragmented and inconsistent. We introduce SurvHTE-Bench, the first comprehensive benchmark for HTE estimation with censored outcomes. The benchmark spans (i) a modular suite of synthetic datasets with known ground truth, systematically varying causal assumptions and survival dynamics, (ii) semi-synthetic datasets that pair real-world covariates with simulated treatments and outcomes, and (iii) real-world datasets from a twin study (with known ground truth) and from an HIV clinical trial. Across synthetic, semi-synthetic, and real-world settings, we provide the first rigorous comparison of survival HTE methods under diverse conditions and realistic assumption violations. SurvHTE-Bench establishes a foundation for fair, reproducible, and extensible evaluation of causal survival methods. The data and code of our benchmark are available at: https://github.com/Shahriarnz14/SurvHTE-Bench .

SurvHTE-Bench: A Benchmark for Heterogeneous Treatment Effect Estimation in Survival Analysis

TL;DR

Abstract

Paper Structure (75 sections, 35 equations, 24 figures, 35 tables)

This paper contains 75 sections, 35 equations, 24 figures, 35 tables.

Introduction
Background and Related Work
SurvHTE-Bench
Benchmarking Results
Synthetic Experiment Results and Analyses
Semi-synthetic data results
Benchmarking on Real Data
Discussion
Additional Details of the Synthetic Datasets
Covariate generation
Treatment assignment mechanisms
Event time generation
Censoring time generation
Observed data construction
Remark on parameter calibration.
...and 60 more sections

Figures (24)

Figure 1: (top) Borda count rankings of the top 10 estimator variants (out of 53 total), based on CATE RMSE across 40 datasets and averaged over 10 repeats (lower is better). (bottom) Family-level rankings, where for each dataset the best method variant within each method family is chosen using validation performance and then ranked on the held-out test set. Black bands connect methods without statistically significant differences (Wilcoxon signed-rank test, FDR-corrected at $\alpha=0.05$). Shaded regions indicate the standard error of the rank across datasets.
Figure 2: CATE RMSE in Scenario C across 10 experimental repeats.
Figure 3: CATE RMSE for twin birth data with $h=30$ days across 10 experimental runs.
Figure 4: CATE estimation comparison between baseline and high-censoring conditions under ZDV vs. ZDV+ddI treatments. Each point represents an individual patient, with the dashed diagonal line indicating perfect consistency between baseline CATE estimation and that with the additional censoring injected.
Figure 5: (Synthetic datasets) Kaplan-Meier curves across causal configurations (rows) and survival scenarios (columns). Solid lines show event-time survival under control (blue) and treatment (orange); dotted lines show censoring-time survival for each arm. Each panel reports the empirical censoring rate $c$ and treatment probability $p$.
...and 19 more figures

SurvHTE-Bench: A Benchmark for Heterogeneous Treatment Effect Estimation in Survival Analysis

TL;DR

Abstract

SurvHTE-Bench: A Benchmark for Heterogeneous Treatment Effect Estimation in Survival Analysis

Authors

TL;DR

Abstract

Table of Contents

Figures (24)