Table of Contents
Fetching ...

A Comprehensive Framework for Evaluating Time to Event Predictions using the Restricted Mean Survival Time

Ariane Cwiling, Vittorio Perduca, Olivier Bouaziz

TL;DR

A novel framework for evaluating RMST estimations, valid for any RMST estimator that is asymptotically convergent and works under model misspecification, and a model‐agnostic statistical test is developed to assess global variable importance.

Abstract

The restricted mean survival time (RMST) is a widely used quantity in survival analysis due to its straightforward interpretation. For instance, predicting the time to event based on patient attributes is of great interest when analyzing medical data. In this paper, we propose a novel framework for evaluating RMST estimations. A criterion that estimates the mean squared error of an RMST estimator using Inverse Probability Censoring Weighting (IPCW) is presented. A model-agnostic conformal algorithm adapted to right-censored data is also introduced to compute prediction intervals and to evaluate local variable importance. Finally, a model-agnostic statistical test is developed to assess global variable importance. Our framework is valid for any RMST estimator that is asymptotically convergent and works under model misspecification.

A Comprehensive Framework for Evaluating Time to Event Predictions using the Restricted Mean Survival Time

TL;DR

A novel framework for evaluating RMST estimations, valid for any RMST estimator that is asymptotically convergent and works under model misspecification, and a model‐agnostic statistical test is developed to assess global variable importance.

Abstract

The restricted mean survival time (RMST) is a widely used quantity in survival analysis due to its straightforward interpretation. For instance, predicting the time to event based on patient attributes is of great interest when analyzing medical data. In this paper, we propose a novel framework for evaluating RMST estimations. A criterion that estimates the mean squared error of an RMST estimator using Inverse Probability Censoring Weighting (IPCW) is presented. A model-agnostic conformal algorithm adapted to right-censored data is also introduced to compute prediction intervals and to evaluate local variable importance. Finally, a model-agnostic statistical test is developed to assess global variable importance. Our framework is valid for any RMST estimator that is asymptotically convergent and works under model misspecification.
Paper Structure (20 sections, 6 theorems, 66 equations, 9 figures, 3 tables, 2 algorithms)

This paper contains 20 sections, 6 theorems, 66 equations, 9 figures, 3 tables, 2 algorithms.

Key Result

Theorem 1

Let $\IfNoValueTF{-NoValue-}{\hat{G}_n}{\hat{G}_{n_-NoValue-}}$ be a consistent estimator in the weak sense, as defined by Equation eq::weakconsistency. Then, under conditional independence (see Equation eq::conditionalindep) we have:

Figures (9)

  • Figure 1: Distribution of $1,000$ replications of the WRSS estimator in the scenario A1 and illustration of its convergence towards $\text{MSE}(\tilde{\mu}_{\tau})$ (see Equation \ref{['eq::MSEdecomposition']}), where $\tilde{\mu}_{\tau}$ represents the limit defined in Equation \ref{['eq::convergence']}. Two learning models are compared. On the left panel, the oracle model \ref{['eq::closeRMST']} is a linear model fitted on the minimum between the true event times and $\tau$, using the correct link function. On the right panel, a linear model is implemented based on pseudo-observations, including all covariates without interaction terms. The red dotted line illustrates the inseparability term. It also represents the $\text{MSE}(\tilde{\mu}_{\tau})$ for the oracle model, whose imprecision term is null. The blue dotted line represents the $\text{MSE}(\tilde{\mu}_{\tau})$ for the model based on pseudo-observations, whose imprecision term is non-zero.
  • Figure 2: Distribution of $1,000$ replications of the WRSS estimator in the scenario A2 and illustration of its convergence towards $\text{MSE}(\tilde{\mu}_{\tau})$ (see Equation \ref{['eq::MSEdecomposition']}), where $\tilde{\mu}_{\tau}$ represents the limit defined in Equation \ref{['eq::convergence']}. Two learning models are compared. On the top panel, the oracle model \ref{['eq::closeRMST']} is a linear model fitted on the minimum between the true event times and $\tau$, using the correct link function. On the bottom panel, a linear model is implemented based on pseudo-observations, including all covariates without interaction terms. In addition, three censoring estimators are compared. From left to right, a Kaplan-Meier method, a Cox model and an RSF model. The red dotted line illustrates the inseparability term. It also represents the $\text{MSE}(\tilde{\mu}_{\tau})$ for the oracle model, whose imprecision term is null. The blue dotted line represents the $\text{MSE}(\tilde{\mu}_{\tau})$ for the model based on pseudo-observations, whose imprecision term is non-zero.
  • Figure 3: Prediction intervals at the $90\%$ level constructed with Algorithm \ref{['algo::IPCWsplit']} for four learning models: the Kaplan-Meier estimator, the Cox model, the RSF model and the linear model based on pseudo-observations. The training size is $n=4,000$ and the prediction intervals are constructed for $10$ individuals independent from the test set. All data are simulated according to the scenario B. The grey dotted line represents the time horizon $\tau = 3.6$. The red segments are placed at the minimum between the true event times of the test set and $\tau$.
  • Figure 4: Empirical coverage for the prediction intervals constructed with Algorithm \ref{['algo::IPCWsplit']} for four learning models: the Kaplan-Meier estimator, the Cox model, the RSF model and the linear model based on pseudo-observations. All data were simulated according to the scenario B.
  • Figure 5: Confidence intervals at the $90\%$ level for $p_k,\,k=1,2,3$ (see Equation \ref{['eq::pkinterval']}), whose values are reported in Table \ref{['tab::calibrationTable']}. The intervals were computed with the global variable importance measure applied to the fixed data set $\IfNoValueTF{1}{D_n}{D_{n_1}}$ and a data set $\IfNoValueTF{2}{D_n}{D_{n_2}}$ of size $n_2=500$ simulated independently according to scenario B. Three learning models are considered: the Cox model, the RSF model and the linear model based on pseudo-observations.
  • ...and 4 more figures

Theorems & Definitions (15)

  • Theorem 1
  • Theorem 2
  • Remark 1
  • Remark 2
  • Theorem 3
  • Remark 3
  • Remark 4
  • proof : Proof of Theorem \ref{['thm::WRSS']}
  • Lemma 1
  • proof
  • ...and 5 more