Causal survival embeddings: non-parametric counterfactual inference under censoring
Carlos García-Meixide, Marcos Matabuena
TL;DR
This paper presents a non-parametric, model-free framework for counterfactual survival analysis under right-censoring by embedding counterfactual distributions in reproducing kernel Hilbert spaces. The approach leverages kernel mean embeddings and integrated depth bands to adjust for confounding without requiring density smoothness, and it provides Hadamard-differentiable operators with convergence guarantees. Through simulations and an application to the SPRINT trial, the method demonstrates stable performance under censoring and offers a flexible tool for time-varying causal inference and hypothesis testing in observational studies. The work sits at the intersection of causal inference, survival analysis, and RKHS theory, offering a practical, extensible alternative to semi-parametric methods with potential for incorporating complex predictors and advanced testing procedures.
Abstract
Model-free time-to-event regression under confounding presents challenges due to biases introduced by causal and censoring sampling mechanisms. This phenomenology poses problems for classical non-parametric estimators like Beran's or the k-nearest neighbours algorithm. In this study, we propose a natural framework that leverages the structure of reproducing kernel Hilbert spaces (RKHS) and, specifically, the concept of kernel mean embedding to address these limitations. Our framework has the potential to enable statistical counterfactual modeling, including counterfactual prediction and hypothesis testing, under right-censoring schemes. Through simulations and an application to the SPRINT trial, we demonstrate the practical effectiveness of our method, yielding coherent results when compared to parallel analyses in existing literature. We also provide a theoretical analysis of our estimator through an RKHS-valued empirical process. Our approach offers a novel tool for performing counterfactual survival estimation in observational studies with incomplete information. It can also be complemented by state-of-the-art algorithms based on semi-parametric and parametric models.
