Upgrading survival models with CARE
William G. Underwood, Henry W. J. Reeve, Oliver Y. Feng, Samuel A. Lambert, Bhramar Mukherjee, Richard J. Samworth
TL;DR
CARE addresses upgrading survival predictions when new covariates become available by fusing an RKHS-based flexible relative risk estimator with existing pre-trained estimators through convex aggregation. The method fits the RKHS estimator via penalised partial likelihood and selects both the regularisation and aggregation weights by cross-validation, yielding an estimator with favorable $L_2$-error properties and oracle-type guarantees. Theoretical results show the kernel estimator achieves rates depending on the kernel spectrum, while CARE adapts to the better among the kernel and external models, with near-oracle performance. Empirical studies on simulations and UK Biobank cardiovascular risk data demonstrate robust gains from CARE, including improvements in concordance for SCORE2 updates, and the approach is implemented in the Python package care-survival.
Abstract
Clinical risk prediction models are regularly updated as new data, often with additional covariates, become available. We propose CARE (Convex Aggregation of relative Risk Estimators) as a general approach for combining existing "external" estimators with a new data set in a time-to-event survival analysis setting. Our method initially employs the new data to fit a flexible family of reproducing kernel estimators via penalised partial likelihood maximisation. The final relative risk estimator is then constructed as a convex combination of the kernel and external estimators, with the convex combination coefficients and regularisation parameters selected using cross-validation. We establish high-probability bounds for the $L_2$-error of our proposed aggregated estimator, showing that it achieves a rate of convergence that is at least as good as both the optimal kernel estimator and the best external model. Empirical results from simulation studies align with the theoretical results, and we illustrate the improvements our methods provide for cardiovascular disease risk modelling. Our methodology is implemented in the Python package care-survival.
