Table of Contents
Fetching ...

Rapid GPU-Based Pangenome Graph Layout

Jiajie Li, Jan-Niklas Schmelzle, Yixiao Du, Simon Heumos, Andrea Guarracino, Giulia Guidi, Pjotr Prins, Erik Garrison, Zhiru Zhang

TL;DR

This work tackles the costly problem of pangenome graph layout by introducing a GPU-accelerated solution for the path-guided SGD layout algorithm. It identifies the workload's data-level parallelism and memory-bound nature, and delivers three targeted optimizations—cache-friendly data layout, coalesced random states, and warp merging—along with a scalable quality metric called sampled path stress. On 24 human chromosomal pangenomes, the approach achieves an average of $57.3\times$ speedup over a 32-core CPU baseline, reducing runtime from hours to minutes while preserving layout quality as indicated by the correlation between $path\_stress$ and $sampled\_path\_stress$ across test cases. The work includes an ablation study and a case study of performance-quality trade-offs, and it is designed to be integrated into the ODGI framework for broad adoption and interactive pangenome visualization.

Abstract

Computational Pangenomics is an emerging field that studies genetic variation using a graph structure encompassing multiple genomes. Visualizing pangenome graphs is vital for understanding genome diversity. Yet, handling large graphs can be challenging due to the high computational demands of the graph layout process. In this work, we conduct a thorough performance characterization of a state-of-the-art pangenome graph layout algorithm, revealing significant data-level parallelism, which makes GPUs a promising option for compute acceleration. However, irregular data access and the algorithm's memory-bound nature present significant hurdles. To overcome these challenges, we develop a solution implementing three key optimizations: a cache-friendly data layout, coalesced random states, and warp merging. Additionally, we propose a quantitative metric for scalable evaluation of pangenome layout quality. Evaluated on 24 human whole-chromosome pangenomes, our GPU-based solution achieves a 57.3x speedup over the state-of-the-art multithreaded CPU baseline without layout quality loss, reducing execution time from hours to minutes.

Rapid GPU-Based Pangenome Graph Layout

TL;DR

This work tackles the costly problem of pangenome graph layout by introducing a GPU-accelerated solution for the path-guided SGD layout algorithm. It identifies the workload's data-level parallelism and memory-bound nature, and delivers three targeted optimizations—cache-friendly data layout, coalesced random states, and warp merging—along with a scalable quality metric called sampled path stress. On 24 human chromosomal pangenomes, the approach achieves an average of speedup over a 32-core CPU baseline, reducing runtime from hours to minutes while preserving layout quality as indicated by the correlation between and across test cases. The work includes an ablation study and a case study of performance-quality trade-offs, and it is designed to be integrated into the ODGI framework for broad adoption and interactive pangenome visualization.

Abstract

Computational Pangenomics is an emerging field that studies genetic variation using a graph structure encompassing multiple genomes. Visualizing pangenome graphs is vital for understanding genome diversity. Yet, handling large graphs can be challenging due to the high computational demands of the graph layout process. In this work, we conduct a thorough performance characterization of a state-of-the-art pangenome graph layout algorithm, revealing significant data-level parallelism, which makes GPUs a promising option for compute acceleration. However, irregular data access and the algorithm's memory-bound nature present significant hurdles. To overcome these challenges, we develop a solution implementing three key optimizations: a cache-friendly data layout, coalesced random states, and warp merging. Additionally, we propose a quantitative metric for scalable evaluation of pangenome layout quality. Evaluated on 24 human whole-chromosome pangenomes, our GPU-based solution achieves a 57.3x speedup over the state-of-the-art multithreaded CPU baseline without layout quality loss, reducing execution time from hours to minutes.
Paper Structure (33 sections, 2 equations, 17 figures, 11 tables, 1 algorithm)

This paper contains 33 sections, 2 equations, 17 figures, 11 tables, 1 algorithm.

Figures (17)

  • Figure 1: A variation graph and its visualization example --- the three genomes are depicted in different colors; the path of interconnected nodes represents the original genome.
  • Figure 2: Layout of the HLA-DRB1 gene --- three distinct variant types are shown in the bounding boxes.
  • Figure 3: Layout update within one step --- $n_i$ and $n_j$ are two nodes representing the nucleotide sequences "ACGTA" and "TTAC", respectively.
  • Figure 4: Scaling of odgi-layout.
  • Figure 5: Microarchitecture bottleneck analysis with VTune.
  • ...and 12 more figures