Table of Contents
Fetching ...

Toward Capturing Genetic Epistasis From Multivariate Genome-Wide Association Studies Using Mixed-Precision Kernel Ridge Regression

Hatem Ltaief, Rabab Alomairy, Qinglei Cao, Jie Ren, Lotfi Slim, Thorsten Kurth, Benedikt Dorschner, Salim Bougouffa, Rached Abdelkhalak, David E. Keyes

TL;DR

This work addresses the computational bottlenecks of multivariate GWAS by reframing kernel ridge regression within a tile-centric, mixed-precision GPU framework. It introduces precision-adaptive RR, INT8/Tensor Core–accelerated distance computations, and a four-precision Cholesky solver managed by PaRSEC to enable end-to-end GWAS on massive cohorts, including 305K real UK Biobank samples and 13M synthetic cases. The approach yields up to $1.805$ mixed-precision ExaOp/s on Alps and demonstrates superior predictive accuracy of KRR over RR (with robust MSPE and Pearson correlations) while using FP8 for significant portions of the workload. The results suggest practical pathways to scale epistasis analyses to national- or continental-scale populations, with strong implications for precision medicine and 3D/multi-omics GWAS in future GPU-accelerated pipelines.

Abstract

We exploit the widening margin in tensor-core performance between [FP64/FP32/FP16/INT8,FP64/FP32/FP16/FP8/INT8] on NVIDIA [Ampere,Hopper] GPUs to boost the performance of output accuracy-preserving mixed-precision computation of Genome-Wide Association Studies (GWAS) of 305K patients from the UK BioBank, the largest-ever GWAS cohort studied for genetic epistasis using a multivariate approach. Tile-centric adaptive-precision linear algebraic techniques motivated by reducing data motion gain enhanced significance with low-precision GPU arithmetic. At the core of Kernel Ridge Regression (KRR) techniques for GWAS lie compute-bound cubic-complexity matrix operations that inhibit scaling to aspirational dimensions of the population, genotypes, and phenotypes. We accelerate KRR matrix generation by redesigning the computation for Euclidean distances to engage INT8 tensor cores while exploiting symmetry.We accelerate solution of the regularized KRR systems by deploying a new four-precision Cholesky-based solver, which, at 1.805 mixed-precision ExaOp/s on a nearly full Alps system, outperforms the state-of-the-art CPU-only REGENIE GWAS software by five orders of magnitude.

Toward Capturing Genetic Epistasis From Multivariate Genome-Wide Association Studies Using Mixed-Precision Kernel Ridge Regression

TL;DR

This work addresses the computational bottlenecks of multivariate GWAS by reframing kernel ridge regression within a tile-centric, mixed-precision GPU framework. It introduces precision-adaptive RR, INT8/Tensor Core–accelerated distance computations, and a four-precision Cholesky solver managed by PaRSEC to enable end-to-end GWAS on massive cohorts, including 305K real UK Biobank samples and 13M synthetic cases. The approach yields up to mixed-precision ExaOp/s on Alps and demonstrates superior predictive accuracy of KRR over RR (with robust MSPE and Pearson correlations) while using FP8 for significant portions of the workload. The results suggest practical pathways to scale epistasis analyses to national- or continental-scale populations, with strong implications for precision medicine and 3D/multi-omics GWAS in future GPU-accelerated pipelines.

Abstract

We exploit the widening margin in tensor-core performance between [FP64/FP32/FP16/INT8,FP64/FP32/FP16/FP8/INT8] on NVIDIA [Ampere,Hopper] GPUs to boost the performance of output accuracy-preserving mixed-precision computation of Genome-Wide Association Studies (GWAS) of 305K patients from the UK BioBank, the largest-ever GWAS cohort studied for genetic epistasis using a multivariate approach. Tile-centric adaptive-precision linear algebraic techniques motivated by reducing data motion gain enhanced significance with low-precision GPU arithmetic. At the core of Kernel Ridge Regression (KRR) techniques for GWAS lie compute-bound cubic-complexity matrix operations that inhibit scaling to aspirational dimensions of the population, genotypes, and phenotypes. We accelerate KRR matrix generation by redesigning the computation for Euclidean distances to engage INT8 tensor cores while exploiting symmetry.We accelerate solution of the regularized KRR systems by deploying a new four-precision Cholesky-based solver, which, at 1.805 mixed-precision ExaOp/s on a nearly full Alps system, outperforms the state-of-the-art CPU-only REGENIE GWAS software by five orders of magnitude.
Paper Structure (27 sections, 8 equations, 14 figures, 1 table, 1 algorithm)

This paper contains 27 sections, 8 equations, 14 figures, 1 table, 1 algorithm.

Figures (14)

  • Figure 1: Genome-wide association study gwasnature.
  • Figure 2: Mixed-precision symmetric rank-k update (SYRK).
  • Figure 3: Leveraging the INT8 / FP8 / FP16 / FP32 / FP64 KRR-based multivariate GWAS for genetic epistasis.
  • Figure 4: Precision heatmaps.
  • Figure 5: MSPE comparisons between diseases using $305,880$ patients and $43,333$ SNPs from UK BioBank.
  • ...and 9 more figures