Higher criticism for rare and weak non-proportional hazard deviations in survival analysis
Alon Kipnis, Ben Galili, Zohar Yakhini
TL;DR
This work tackles the challenge of detecting rare and weak non-proportional hazard differences in survival data, where standard methods like the log-rank test may fail. It introduces Higher Criticism of hypergeometric p-values (HCHG) to aggregate interval-specific evidence while accommodating non-informative right-censoring; the method simultaneously identifies the time intervals driving global differences. Theoretical analysis reveals a phase transition in asymptotic power along a curve ρ(β), with HCHG powerful when r > ρ(β) and log-rank powerless for many regimes; empirical results on simulations and SCANB gene-expression survival data corroborate the advantages of HCHG and its ability to localize departures. The approach offers a principled, permutation-calibrated framework that enhances detection of temporally localized effects and provides interpretable interval-level insights for intervention and follow-up analyses.
Abstract
We propose a method for comparing survival data based on the higher criticism of p-values obtained from multiple exact hypergeometric tests. The method accommodates non-informative right-censorship and is sensitive to hazard differences in unknown and relatively rare time intervals. It attains much better power against such differences than the log-rank test and its variants. We demonstrate the usefulness of our method in detecting rare and weak non-proportional hazard differences compared to existing tests, using simulations and actual gene expression data. Additionally, we analyze the asymptotic power of our method and other tests under a theoretical framework describing two groups experiencing failure rates that are usually identical over time, except in a few unknown instances where one group's failure rate is higher. Our test's power undergoes a phase transition across the plane of rarity and intensity parameters that mirrors the phase transition of higher criticism in two-sample settings with rare and weak normal and Poisson means. The region of the plane in which our method has asymptotically full power is larger than the corresponding region for the log-rank test.
