Table of Contents
Fetching ...

Survival Analysis with Graph-Based Regularization for Predictors

Liyan Xie, Xi He, Pinar Keskinocak, Yao Xie

TL;DR

This work introduces a graph-regularized maximum partial likelihood approach for Cox survival models to address high-dimensional, correlated predictors. By encoding predictor relationships in a graph G and employing the norm $\|\boldsymbol{\beta}\|_{G,\bm{\tau}}$, the method jointly selects groups of related variables and improves prediction via a group-lasso–type formulation achieved through predictor duplication. The authors establish finite-sample recovery guarantees and asymptotic normality, and demonstrate through simulations and real organ transplantation data that incorporating graph structure yields better estimation accuracy and higher concordance (c-index) than classic regularizers. The approach is applicable beyond organ transplantation and provides a scalable framework for graph-informed variable selection in survival analysis, with clear paths for extending to weighted or donor–recipient networks. Practical impact includes more reliable identification of survival-related factors and improved risk prediction in settings with correlated predictors and right-censored data.

Abstract

We study the variable selection problem in survival analysis to identify the most important factors affecting survival time. Our method incorporates prior knowledge of mutual correlations among variables, represented through a graph. We utilize the Cox proportional hazard model with a graph-based regularizer for variable selection. We present a computationally efficient algorithm developed to solve the graph regularized maximum likelihood problem by establishing connections with the group lasso, and provide theoretical guarantees about the recovery error and asymptotic distribution of the proposed estimators. The improved performance of the proposed approach compared with existing methods are demonstrated in both synthetic and real organ transplantation datasets.

Survival Analysis with Graph-Based Regularization for Predictors

TL;DR

This work introduces a graph-regularized maximum partial likelihood approach for Cox survival models to address high-dimensional, correlated predictors. By encoding predictor relationships in a graph G and employing the norm , the method jointly selects groups of related variables and improves prediction via a group-lasso–type formulation achieved through predictor duplication. The authors establish finite-sample recovery guarantees and asymptotic normality, and demonstrate through simulations and real organ transplantation data that incorporating graph structure yields better estimation accuracy and higher concordance (c-index) than classic regularizers. The approach is applicable beyond organ transplantation and provides a scalable framework for graph-informed variable selection in survival analysis, with clear paths for extending to weighted or donor–recipient networks. Practical impact includes more reliable identification of survival-related factors and improved risk prediction in settings with correlated predictors and right-censored data.

Abstract

We study the variable selection problem in survival analysis to identify the most important factors affecting survival time. Our method incorporates prior knowledge of mutual correlations among variables, represented through a graph. We utilize the Cox proportional hazard model with a graph-based regularizer for variable selection. We present a computationally efficient algorithm developed to solve the graph regularized maximum likelihood problem by establishing connections with the group lasso, and provide theoretical guarantees about the recovery error and asymptotic distribution of the proposed estimators. The improved performance of the proposed approach compared with existing methods are demonstrated in both synthetic and real organ transplantation datasets.

Paper Structure

This paper contains 18 sections, 6 theorems, 56 equations, 5 figures, 22 tables, 1 algorithm.

Key Result

Theorem 1

Under the Assumptions assup1 and assup2, let $\tau_{\min} = \min_{1\leq i\leq p}\tau_i$. For the optimal solution $\widehat{ \bm\beta}$ of problem eqn:opt_graph, there exist constants $D,D',K,K'$ such that with probability at least $1-pD e^{-Kn\lambda^2 \tau_{\min}^2/p} - p^2D'e^{-K'n \tau_{\min}^4\

Figures (5)

  • Figure 1: Graph structure for correlation of variables in a pediatric kidney transplant data set: the inverse covariance matrix of the numerical variables in the living donor dataset (left) and the deceased donor dataset (right); more details are given in Section \ref{['sec:data']}.
  • Figure 2: Illustration of three predictor graph typologies used in the simulation. From left to right: the sparse graph, the ring graph, and the graph with three communities.
  • Figure 3: The boxplot of the model c-indices on the living (upper) and deceased (bottom) donor datasets. The blue line indicates the median c-index of the proposed method; the red line indicates where the c-index equals 0.5 (random guessing).
  • Figure 4: Inverse covariance of the numerical variables in the pbcseq dataset.
  • Figure 5: The boxplot of the model c-indices on the pbcseq dataset. The blue line indicates the median c-index of the proposed method.

Theorems & Definitions (10)

  • Remark 1: Motivation for Graph-based Regularization
  • Theorem 1: Finite Sample Bounds
  • Theorem 2: Asymptotic Normality
  • Lemma 1: [sun2014network, Lemma A.2]
  • Lemma 2: [sun2014network, Lemma A.3]
  • Lemma 3
  • proof
  • Lemma 4: [yu2016sparse, Lemma 2]
  • proof : Proof of Theorem \ref{['thm:oracle']}
  • proof : Proof to Theorem \ref{['thm:asynormal']}