Table of Contents
Fetching ...

Upgrading survival models with CARE

William G. Underwood, Henry W. J. Reeve, Oliver Y. Feng, Samuel A. Lambert, Bhramar Mukherjee, Richard J. Samworth

TL;DR

CARE addresses upgrading survival predictions when new covariates become available by fusing an RKHS-based flexible relative risk estimator with existing pre-trained estimators through convex aggregation. The method fits the RKHS estimator via penalised partial likelihood and selects both the regularisation and aggregation weights by cross-validation, yielding an estimator with favorable $L_2$-error properties and oracle-type guarantees. Theoretical results show the kernel estimator achieves rates depending on the kernel spectrum, while CARE adapts to the better among the kernel and external models, with near-oracle performance. Empirical studies on simulations and UK Biobank cardiovascular risk data demonstrate robust gains from CARE, including improvements in concordance for SCORE2 updates, and the approach is implemented in the Python package care-survival.

Abstract

Clinical risk prediction models are regularly updated as new data, often with additional covariates, become available. We propose CARE (Convex Aggregation of relative Risk Estimators) as a general approach for combining existing "external" estimators with a new data set in a time-to-event survival analysis setting. Our method initially employs the new data to fit a flexible family of reproducing kernel estimators via penalised partial likelihood maximisation. The final relative risk estimator is then constructed as a convex combination of the kernel and external estimators, with the convex combination coefficients and regularisation parameters selected using cross-validation. We establish high-probability bounds for the $L_2$-error of our proposed aggregated estimator, showing that it achieves a rate of convergence that is at least as good as both the optimal kernel estimator and the best external model. Empirical results from simulation studies align with the theoretical results, and we illustrate the improvements our methods provide for cardiovascular disease risk modelling. Our methodology is implemented in the Python package care-survival.

Upgrading survival models with CARE

TL;DR

CARE addresses upgrading survival predictions when new covariates become available by fusing an RKHS-based flexible relative risk estimator with existing pre-trained estimators through convex aggregation. The method fits the RKHS estimator via penalised partial likelihood and selects both the regularisation and aggregation weights by cross-validation, yielding an estimator with favorable -error properties and oracle-type guarantees. Theoretical results show the kernel estimator achieves rates depending on the kernel spectrum, while CARE adapts to the better among the kernel and external models, with near-oracle performance. Empirical studies on simulations and UK Biobank cardiovascular risk data demonstrate robust gains from CARE, including improvements in concordance for SCORE2 updates, and the approach is implemented in the Python package care-survival.

Abstract

Clinical risk prediction models are regularly updated as new data, often with additional covariates, become available. We propose CARE (Convex Aggregation of relative Risk Estimators) as a general approach for combining existing "external" estimators with a new data set in a time-to-event survival analysis setting. Our method initially employs the new data to fit a flexible family of reproducing kernel estimators via penalised partial likelihood maximisation. The final relative risk estimator is then constructed as a convex combination of the kernel and external estimators, with the convex combination coefficients and regularisation parameters selected using cross-validation. We establish high-probability bounds for the -error of our proposed aggregated estimator, showing that it achieves a rate of convergence that is at least as good as both the optimal kernel estimator and the best external model. Empirical results from simulation studies align with the theoretical results, and we illustrate the improvements our methods provide for cardiovascular disease risk modelling. Our methodology is implemented in the Python package care-survival.

Paper Structure

This paper contains 50 sections, 46 theorems, 323 equations, 12 figures.

Key Result

Lemma 1

We have $f_0 \in \mathop{\mathrm{argmin}}\limits_{f \in {\mathcal{B}}({\mathcal{X}})} \ell_\star(f)$. Further, $f_0$ is unique in the sense that if $\tilde{f} \in \mathop{\mathrm{argmin}}\limits_{f \in {\mathcal{B}}({\mathcal{X}})} \ell_\star(f)$ and $P_X(\tilde{f}) = 0$, then $\tilde{f}(x) = f_0(x)

Figures (12)

  • Figure 1: Instances of the kernels described in Examples \ref{['ex:gaussian']}, \ref{['ex:polynomial']} and \ref{['ex:sobolev']}.
  • Figure 2: In panel (a), we plot a sample of size $n = 200$ from the specified data distribution. In panel (b), we display the associated Breslow estimator of the survival function.
  • Figure 3: In panel (a), we show how cross-validation is conducted by minimising the negative partial log-likelihood on an independent sample of size $n=200$. In panel (b), we illustrate how the resulting estimator approximates the true relative risk function.
  • Figure 4: In panel (a), we plot the cross-validated and oracle choices of regularisation parameter, as a function of the sample size. In panel (b), we plot the corresponding $L_2$-errors of the fitted prediction functions.
  • Figure 5: In panel (a), we plot the cross-validated and oracle choices of the convex combination parameter as a function of the sample size. In panel (b), we compare the corresponding CARE, kernel, oracle and external estimators.
  • ...and 7 more figures

Theorems & Definitions (96)

  • Lemma 1: Characterisation of $f_0$
  • Lemma 2: Supremum norm bounds
  • Example 3: Shifted Gaussian kernel
  • Example 4: Polynomial kernel
  • Example 5: Shifted Sobolev kernels
  • Proposition 6: Representation of $\hat{f}_{n,\gamma}$
  • Theorem 7: Rate of convergence
  • Lemma 8: Bound on $H_\gamma$ for a polynomial kernel
  • Lemma 9: Bound on $H_\gamma$ for a shifted first-order Sobolev kernel
  • Theorem 10: Parameter tuning
  • ...and 86 more