Table of Contents
Fetching ...

GateKeeper-GPU: Fast and Accurate Pre-Alignment Filtering in Short Read Mapping

Zülal Bingöl, Mohammed Alser, Onur Mutlu, Ozcan Ozturk, Can Alkan

TL;DR

This work tackles the computational bottleneck in short read mapping caused by the expensive verification step, which traditionally relies on quadratic-time dynamic programming. It introduces GateKeeper-GPU, a CUDA-based pre-alignment filter that improves filtering accuracy over prior GateKeeper implementations and exploits massive GPU parallelism to rapidly assess many read-reference pairs. The authors integrate GateKeeper-GPU with mrFAST, conduct thorough evaluations of accuracy, throughput, and resource use across real and simulated data, and demonstrate substantial end-to-end speedups (up to $1.4\times$) and verification-time reductions (up to $2.9\times$). The results indicate GateKeeper-GPU is a practical, scalable enhancement for read mapping pipelines, with two encoding modes and clear guidance on performance trade-offs and future optimizations.

Abstract

At the last step of short read mapping, the candidate locations of the reads on the reference genome are verified to compute their differences from the corresponding reference segments using sequence alignment algorithms. Calculating the similarities and differences between two sequences is still computationally expensive since approximate string matching techniques traditionally inherit dynamic programming algorithms with quadratic time and space complexity. We introduce GateKeeper-GPU, a fast and accurate pre-alignment filter that efficiently reduces the need for expensive sequence alignment. GateKeeper-GPU provides two main contributions: first, improving the filtering accuracy of GateKeeper (a lightweight pre-alignment filter), and second, exploiting the massive parallelism provided by the large number of GPU threads of modern GPUs to examine numerous sequence pairs rapidly and concurrently. By reducing the work, GateKeeper-GPU provides an acceleration of 2.9x to sequence alignment and up to 1.4x speedup to the end-to-end execution time of a comprehensive read mapper (mrFAST). GateKeeper-GPU is available at https://github.com/BilkentCompGen/GateKeeper-GPU.

GateKeeper-GPU: Fast and Accurate Pre-Alignment Filtering in Short Read Mapping

TL;DR

This work tackles the computational bottleneck in short read mapping caused by the expensive verification step, which traditionally relies on quadratic-time dynamic programming. It introduces GateKeeper-GPU, a CUDA-based pre-alignment filter that improves filtering accuracy over prior GateKeeper implementations and exploits massive GPU parallelism to rapidly assess many read-reference pairs. The authors integrate GateKeeper-GPU with mrFAST, conduct thorough evaluations of accuracy, throughput, and resource use across real and simulated data, and demonstrate substantial end-to-end speedups (up to ) and verification-time reductions (up to ). The results indicate GateKeeper-GPU is a practical, scalable enhancement for read mapping pipelines, with two encoding modes and clear guidance on performance trade-offs and future optimizations.

Abstract

At the last step of short read mapping, the candidate locations of the reads on the reference genome are verified to compute their differences from the corresponding reference segments using sequence alignment algorithms. Calculating the similarities and differences between two sequences is still computationally expensive since approximate string matching techniques traditionally inherit dynamic programming algorithms with quadratic time and space complexity. We introduce GateKeeper-GPU, a fast and accurate pre-alignment filter that efficiently reduces the need for expensive sequence alignment. GateKeeper-GPU provides two main contributions: first, improving the filtering accuracy of GateKeeper (a lightweight pre-alignment filter), and second, exploiting the massive parallelism provided by the large number of GPU threads of modern GPUs to examine numerous sequence pairs rapidly and concurrently. By reducing the work, GateKeeper-GPU provides an acceleration of 2.9x to sequence alignment and up to 1.4x speedup to the end-to-end execution time of a comprehensive read mapper (mrFAST). GateKeeper-GPU is available at https://github.com/BilkentCompGen/GateKeeper-GPU.

Paper Structure

This paper contains 28 sections, 8 figures, 6 tables.

Figures (8)

  • Figure 1: High-level view of seed-and-extend paradigm.
  • Figure 2: Strategy for improving leading and trailing 0 bits. Reference and Shifted Read show bitvectors of candidate reference segment and shifted read, respectively; H and A represent Hamming mask and amended mask.
  • Figure 3: Amended masks produced by GateKeeper Alser2017 and GateKeeper-GPU.
  • Figure 4: False accept analysis - 100bp.
  • Figure 5: False accept comparison for Set_1 with a read length of 100bp and the number of undefined pairs is 28,009.
  • ...and 3 more figures