Table of Contents
Fetching ...

Data-adaptive gene and pathway-based tests forrare-variant associations with survival outcomes

Yu Wang, Kwang Woo Ahn, Sarah L. Kerns, William Hall, Petra Seibold, Christopher J. Talbot, Ana Vega, Barry S. Rosenstein, Nawaid Usmani, Catharine M. L. West, Liv Veldeman, Paul L. Auer, Zhongyuan Chen

Abstract

Statistical methods for testing aggregate rare-variant genetic associations are typically based on either burden or dispersion tests (or a combination of the two). These methods lack statistical power in the presence of diverse genetic architectures. Moreover, few aggregate rare-variant association methods have been developed specifically for survival data. To address these issues, we propose data-adaptive gene- and pathway-based association tests based on Schoenfeld residuals in Cox proportional hazards models for association studies between an aggregate of rare-variants and survival outcomes. Our methods improve statistical power while maintaining flexibility across various genetic effect sizes and directions. We develop an efficient R package that enables fast computation and supports data simulation as well as gene- and pathway-level testing. Applying our approach to late bladder toxicity following radiotherapy for non-metastatic prostate cancer, we identify biologically relevant genes and pathways, replicate known signals, and capture additional associations. Our method provides a powerful, adaptive framework for survival-based genetic association studies of rare-variants. Keywords: aSPU, time-to-event outcomes, rare-variant associations, Cox regression, Schoenfeld residuals

Data-adaptive gene and pathway-based tests forrare-variant associations with survival outcomes

Abstract

Statistical methods for testing aggregate rare-variant genetic associations are typically based on either burden or dispersion tests (or a combination of the two). These methods lack statistical power in the presence of diverse genetic architectures. Moreover, few aggregate rare-variant association methods have been developed specifically for survival data. To address these issues, we propose data-adaptive gene- and pathway-based association tests based on Schoenfeld residuals in Cox proportional hazards models for association studies between an aggregate of rare-variants and survival outcomes. Our methods improve statistical power while maintaining flexibility across various genetic effect sizes and directions. We develop an efficient R package that enables fast computation and supports data simulation as well as gene- and pathway-level testing. Applying our approach to late bladder toxicity following radiotherapy for non-metastatic prostate cancer, we identify biologically relevant genes and pathways, replicate known signals, and capture additional associations. Our method provides a powerful, adaptive framework for survival-based genetic association studies of rare-variants. Keywords: aSPU, time-to-event outcomes, rare-variant associations, Cox regression, Schoenfeld residuals

Paper Structure

This paper contains 13 sections, 15 equations, 6 figures, 5 tables, 1 algorithm.

Figures (6)

  • Figure 1: Illustration of $\bm Z^s$ permutation
  • Figure 2: The framework of aSPUS. The top-left stacked tables represent permutations of the variant matrix $\bm Z^s$; the shaded one is observed. The weight matrix $\Omega$ is calculated based on observed times and status for each subject in $\bm Z^m$, remaining constant throughout. For each pair of $(\gamma, \gamma_G)$, combining $\Omega$ with each permutation of $\bm Z^s$ in the score function yields the test statistic $u$. Aggregating $u$ from all permutations produces the corresponding $p$-value. Enumerating all pairs of $(\gamma, \gamma_G)$ finds the minimal $p$-value $p_{min}$ for each permutation. Finally, we compare $p_{min}$ values from permutations and the observation, treating $p_{aSPUS}$ as the final $p$-value for the gene or pathway of interest.
  • Figure 3: Simulation results for gene- and pathway-based tests comparing three methods. a Power versus effect size $\beta\in(0,0.6]$ for combinations of 1, 3, and 5 causal SNPs in genes containing 10, 50, or 100 SNPs under a correlated SNP structure. Both aSPUS and the Burden test gain power as effect size increases, while aSPUS performs slightly better when genes contain more SNPs. b Power versus effect size $\beta\in(0,0.6]$ for pathways consisting of 20 genes with 5, 10, or 15 causal genes and 10 or 50 SNPs per gene. In both settings, aSPUS demonstrates a consistent advantage over the other methods. c--d Computational benchmarks based on 10 simulation replicates with 2,000 subjects and 80 SNPs in gene-based tests. Although aSPUS requires more CPU time, its memory usage is comparable to the other parametric methods.
  • Figure 4: Summary of Data Analysis. a and b show QQ plots at gene- and pathway-levels. In gene-level tests, all methods except CopulaFLM have observed and expected $p$-values well-aligned along the diagonal. c shows detected gene-call overlap among the five methods, with FDR < 0.1. d Decomposes genes with per-variant adjusted effects for detected genes. e Summarizes the top 5 genes detected by at least one of the five methods with $\alpha = 0.05$. The central heatmap is colored by correlation between variant dosage. Genes are grouped and colored with their variants, ordered by $Z$ score from fitting Cox regression with the variant alone. Left panel columns indicate whether each method calls this gene.
  • Figure 5: Pathway-call overlap among aSPUS, Burden, and SKAT, with the filter raw $p$-value < 0.05
  • ...and 1 more figures