Survival Kernets: Scalable and Interpretable Deep Kernel Survival Analysis with an Accuracy Guarantee
George H. Chen
TL;DR
Survival Kernets propose a scalable, interpretable deep kernel survival analysis framework that leverages kernel netting to compress training data into clusters for efficient test-time prediction. A warm-start strategy (tuna) using scalable tree ensembles accelerates training, enabling large-scale evaluation on datasets with millions of points while maintaining competitive accuracy. The model yields interpretable cluster-level visualizations (heatmaps and Kaplan-Meier curves) and provides a finite-sample error bound for a special case, linking kernel-netted predictions to classical Kaplan-Meier estimates. Empirical results on four diverse survival datasets show strong time-dependent concordance performance and substantial training-time savings, with detailed visualizations illustrating cluster-level survival patterns and potential clinical insights.
Abstract
Kernel survival analysis models estimate individual survival distributions with the help of a kernel function, which measures the similarity between any two data points. Such a kernel function can be learned using deep kernel survival models. In this paper, we present a new deep kernel survival model called a survival kernet, which scales to large datasets in a manner that is amenable to model interpretation and also theoretical analysis. Specifically, the training data are partitioned into clusters based on a recently developed training set compression scheme for classification and regression called kernel netting that we extend to the survival analysis setting. At test time, each data point is represented as a weighted combination of these clusters, and each such cluster can be visualized. For a special case of survival kernets, we establish a finite-sample error bound on predicted survival distributions that is, up to a log factor, optimal. Whereas scalability at test time is achieved using the aforementioned kernel netting compression strategy, scalability during training is achieved by a warm-start procedure based on tree ensembles such as XGBoost and a heuristic approach to accelerating neural architecture search. On four standard survival analysis datasets of varying sizes (up to roughly 3 million data points), we show that survival kernets are highly competitive compared to various baselines tested in terms of time-dependent concordance index. Our code is available at: https://github.com/georgehc/survival-kernets
