Table of Contents
Fetching ...

Spectral Survival Analysis

Chengzhi Shi, Stratis Ioannidis

TL;DR

This work identifies a fundamental connection between rank regression and the CoxPH model, which allows it to adapt and extend the so-called spectral method for rank regression to survival analysis and empirically verify the method's scalability on multiple real-world high-dimensional datasets.

Abstract

Survival analysis is widely deployed in a diverse set of fields, including healthcare, business, ecology, etc. The Cox Proportional Hazard (CoxPH) model is a semi-parametric model often encountered in the literature. Despite its popularity, wide deployment, and numerous variants, scaling CoxPH to large datasets and deep architectures poses a challenge, especially in the high-dimensional regime. We identify a fundamental connection between rank regression and the CoxPH model: this allows us to adapt and extend the so-called spectral method for rank regression to survival analysis. Our approach is versatile, naturally generalizing to several CoxPH variants, including deep models. We empirically verify our method's scalability on multiple real-world high-dimensional datasets; our method outperforms legacy methods w.r.t. predictive performance and efficiency.

Spectral Survival Analysis

TL;DR

This work identifies a fundamental connection between rank regression and the CoxPH model, which allows it to adapt and extend the so-called spectral method for rank regression to survival analysis and empirically verify the method's scalability on multiple real-world high-dimensional datasets.

Abstract

Survival analysis is widely deployed in a diverse set of fields, including healthcare, business, ecology, etc. The Cox Proportional Hazard (CoxPH) model is a semi-parametric model often encountered in the literature. Despite its popularity, wide deployment, and numerous variants, scaling CoxPH to large datasets and deep architectures poses a challenge, especially in the high-dimensional regime. We identify a fundamental connection between rank regression and the CoxPH model: this allows us to adapt and extend the so-called spectral method for rank regression to survival analysis. Our approach is versatile, naturally generalizing to several CoxPH variants, including deep models. We empirically verify our method's scalability on multiple real-world high-dimensional datasets; our method outperforms legacy methods w.r.t. predictive performance and efficiency.

Paper Structure

This paper contains 36 sections, 4 theorems, 61 equations, 4 figures, 10 tables, 2 algorithms.

Key Result

Theorem 1

The stationary point of equation lin pi satisfies the balance equations of a continuous-time Markov Chain with transition rates where $[n]_{+}=\{i:\sigma_i(\bm{\pi})\geq 0\}$, $[n]_{-}=\{i:\sigma_i(\bm{\pi})\leq 0\}$, and

Figures (4)

  • Figure 1: Comparison of our proposed Spectral method against SOTA competitors on eight datasets, w.r.t. predictive performance (CI, $\uparrow$), runtime (s, $\downarrow$), and memory (MB, $\downarrow$). All methods executed over the same dataset are connected to the proposed method Spectral; not all methods were applicable to all datasets, and some ran out of memory (see Table. \ref{['tab: LUNG1 performance.']}). Spectral consistently achieves superior predictive performance over competitors while using less memory. It also exhibits marked acceleration over the DeepSurv base model, with which it shares the same objective; it is comparable in runtime to other methods with simpler objectives which, nevertheless, perform worse in predictive performance.
  • Figure 2: Illustration of the standard survival analysis problem setting. Each sample is associated with a $d$-dimensional feature vector $\bm{x}_i \in \mathbb{R}^d$, an event time $T_i>0$, and a censoring time $C_i>0$. Event times are observed only if they occur before the censoring time: for example, no event is observed for $x_2$ before the censoring time $C_2$.
  • Figure 3: Comparison of memory (MB) and runtime per sample (s) between Spectral and competitor methods across survival analysis (SA) and counting process (CP) datasets. All values are also reported in Table \ref{['tab: runtime and memory']} in App. \ref{['app: additional experiments']}. W.r.t. memory, Spectral outperforms all other baselines in all cases. Moreover, the advantage is more obvious in the computational intensive cases: LUNG1 (high $d$) and ADS100K (high $n$). In terms of runtime per sample, Spectral consistently accelerates the convergence over its DeepSurv base model, with which it shares the same objective. It has a comparable performance to other methods that use simpler objectives; however, these simpler methods suffer from worse predictive performance compared to Spectral and, often, DeepSurv (see also Table \ref{['tab:remaining_performance']} and Fig. \ref{['fig:teaser_standard']}).
  • Figure 4: Illustration of the setting of Chen et al. chen2023gateway, modeling ADS click from survival analysis (counting process) perspective. Here $E$ denotes impressions, $\times$ denotes an event (ad click), and $j$ denotes journeys. Not all journeys lead to an ad click. The goal is to predict the time an ad is clicked from past impressions and features associated with them.

Theorems & Definitions (4)

  • Theorem 1
  • theorem 1: Yildiz et al. yildiz21a
  • Lemma 1
  • Corollary 1