Table of Contents
Fetching ...

Higher criticism for rare and weak non-proportional hazard deviations in survival analysis

Alon Kipnis, Ben Galili, Zohar Yakhini

TL;DR

This work tackles the challenge of detecting rare and weak non-proportional hazard differences in survival data, where standard methods like the log-rank test may fail. It introduces Higher Criticism of hypergeometric p-values (HCHG) to aggregate interval-specific evidence while accommodating non-informative right-censoring; the method simultaneously identifies the time intervals driving global differences. Theoretical analysis reveals a phase transition in asymptotic power along a curve ρ(β), with HCHG powerful when r > ρ(β) and log-rank powerless for many regimes; empirical results on simulations and SCANB gene-expression survival data corroborate the advantages of HCHG and its ability to localize departures. The approach offers a principled, permutation-calibrated framework that enhances detection of temporally localized effects and provides interpretable interval-level insights for intervention and follow-up analyses.

Abstract

We propose a method for comparing survival data based on the higher criticism of p-values obtained from multiple exact hypergeometric tests. The method accommodates non-informative right-censorship and is sensitive to hazard differences in unknown and relatively rare time intervals. It attains much better power against such differences than the log-rank test and its variants. We demonstrate the usefulness of our method in detecting rare and weak non-proportional hazard differences compared to existing tests, using simulations and actual gene expression data. Additionally, we analyze the asymptotic power of our method and other tests under a theoretical framework describing two groups experiencing failure rates that are usually identical over time, except in a few unknown instances where one group's failure rate is higher. Our test's power undergoes a phase transition across the plane of rarity and intensity parameters that mirrors the phase transition of higher criticism in two-sample settings with rare and weak normal and Poisson means. The region of the plane in which our method has asymptotically full power is larger than the corresponding region for the log-rank test.

Higher criticism for rare and weak non-proportional hazard deviations in survival analysis

TL;DR

This work tackles the challenge of detecting rare and weak non-proportional hazard differences in survival data, where standard methods like the log-rank test may fail. It introduces Higher Criticism of hypergeometric p-values (HCHG) to aggregate interval-specific evidence while accommodating non-informative right-censoring; the method simultaneously identifies the time intervals driving global differences. Theoretical analysis reveals a phase transition in asymptotic power along a curve ρ(β), with HCHG powerful when r > ρ(β) and log-rank powerless for many regimes; empirical results on simulations and SCANB gene-expression survival data corroborate the advantages of HCHG and its ability to localize departures. The approach offers a principled, permutation-calibrated framework that enhances detection of temporally localized effects and provides interpretable interval-level insights for intervention and follow-up analyses.

Abstract

We propose a method for comparing survival data based on the higher criticism of p-values obtained from multiple exact hypergeometric tests. The method accommodates non-informative right-censorship and is sensitive to hazard differences in unknown and relatively rare time intervals. It attains much better power against such differences than the log-rank test and its variants. We demonstrate the usefulness of our method in detecting rare and weak non-proportional hazard differences compared to existing tests, using simulations and actual gene expression data. Additionally, we analyze the asymptotic power of our method and other tests under a theoretical framework describing two groups experiencing failure rates that are usually identical over time, except in a few unknown instances where one group's failure rate is higher. Our test's power undergoes a phase transition across the plane of rarity and intensity parameters that mirrors the phase transition of higher criticism in two-sample settings with rare and weak normal and Poisson means. The region of the plane in which our method has asymptotically full power is larger than the corresponding region for the log-rank test.
Paper Structure (46 sections, 10 theorems, 138 equations, 7 figures, 4 tables, 2 algorithms)

This paper contains 46 sections, 10 theorems, 138 equations, 7 figures, 4 tables, 2 algorithms.

Key Result

Theorem 1

Consider testing $H_0$ versus $H_1$ as in eq:model_full1 when $x_0$, $y_0$, $\{\bar{\lambda}_t\}$, $\epsilon$, and $\delta$ are calibrated to $T$ as in eq:calibration_kappa-eq:calibration_rates. If $r > \rho(\beta)$, $\mathrm{HCHG}_T$ of eq:HC_def is asymptotically powerful.

Figures (7)

  • Figure 1: Survival data thought to experience temporarily rare excessive hazard in Group $y$ compared to Group $x$. Higher criticism of the hypergeometric p-values (HCHG) indicates an excessive hazard in Group $y$, while the log-rank test does not. Top (figure): Kaplan-Meier curves of the data. Gray bars indicate membership in the set $\Delta^\star$ of time instances providing the best evidence for a global rare hazard difference. Intervals with censoring events are decorated with $+$. Bottom (table): At-risk subjects and events occurring in two groups over $T=84$ time intervals, and the corresponding hypergeometric p-values $\{p_t\}_{t=1}^T$ of \ref{['eq:pvals_def']}.
  • Figure 2: Phase transition and regions of asymptotic power under the piece-wise homogeneous exponential decay model with rare and weak hazard departures of \ref{['eq:data_model0']}-\ref{['eq:lambda_prime_def']}. Here $\beta$ controls the number of intervals of excessive hazard and $r$ controls their strength. Our HCHG procedure is asymptotically powerful for $r > \rho(\beta)$. The log-rank test is asymptotically powerless for any $\beta>1/2$. All tests based on randomized hypergeometric tests are asymptotically powerless for $r < \rho(\beta)$.
  • Figure 3: Empirical power of higher criticism of hyper-geometric (HCHG) p-values (left) and log-rank (right) at level $\alpha=0.05$. The curves $\rho(\beta)$ and $\hat{\rho}(\beta)$ are the theoretical and Monte-Carlo simulated phase transitions of HCHG, respectively. The line $\beta=0.5$ and the curve $\hat{\rho}_{\mathrm{LR}_T}(\beta)$ are the theoretical and Monte-Carlo simulated phase transition of the log-rank statistics of \ref{['eq:logrank']}, respectively.
  • Figure 4: Configurations with significant empirical power differences between a test based on $\mathrm{HCHG}_T$ of \ref{['eq:HC_def']} and tests based on other statistics. A gray point indicates no significant power difference towards any statistic. We used $N=1,000$ experiments in each $(r,\beta)$ configuration. Each experiment simulates a sample from the piece-wise exponential decay model \ref{['eq:data_model1']} with rare and weak departures over $T=1,000$ time intervals.
  • Figure 5: Simulated null distribution of $\mathrm{HCHG}_T$. Histogram of $\mathrm{HCHG}_T$ over $N={50,000}$ random group assignments with event times taken from the SCANB gene expression dataset. The $0.95$-th quantile is indicated.
  • ...and 2 more figures

Theorems & Definitions (19)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • Lemma 4
  • ...and 9 more