Table of Contents
Fetching ...

Parallel DNA Sequence Alignment on High-Performance Systems with CUDA and MPI

Linus Zwaka

TL;DR

The paper tackles the computational bottleneck of global sequence alignment by implementing Needleman-Wunsch on a hybrid CUDA-MPI platform, introducing a per-cell parallel CUDA kernel for single alignments and using MPI to distribute many alignments across a cluster. It employs a center-star heuristic for multiple sequence alignment and uses a per-cell dependency grid with spin-wait synchronization to minimize idle time, achieving substantial speedups over CPU baselines. Strong and weak scaling analyses demonstrate effective GPU and distributed-resource utilization for large-scale alignments, with the overall time for pairwise alignments scaling as $O(n^2 k^2)$. The work highlights the potential of high-performance computing to advance sequence alignment workflows with minimal changes to the core DP algorithm, paving the way for genome-scale analyses on HPC systems.

Abstract

Sequence alignment is a cornerstone of bioinformatics, widely used to identify similarities between DNA, RNA, and protein sequences and studying evolutionary relationships and functional properties. The Needleman-Wunsch algorithm remains a robust and accurate method for global sequence alignment. However, its computational complexity, O(mn), poses significant challenges when processing large-scale datasets or performing multiple sequence alignments. To address these limitations, a hybrid implementation of the Needleman-Wunsch algorithm that leverages CUDA for parallel execution on GPUs and MPI for distributed computation across multiple nodes on a supercomputer is proposed. CUDA efficiently offloads computationally intensive tasks to GPU cores, while MPI enables communication and workload distribution across nodes to handle large-scale alignments. This work details the implementation and performance evaluation of the Needleman-Wunsch algorithm in a massively parallel computing environment. Experimental results demonstrate significant acceleration of the alignment process compared to traditional CPU-based implementations, particularly for large input sizes and multiple sequence alignments. In summary, the combination of CUDA and MPI effectively overcomes the computational bottlenecks inherent to the Needleman-Wunsch algorithm without requiring substantial modifications to the underlying algorithm, highlighting the potential of high-performance computing in advancing sequence alignment workflows.

Parallel DNA Sequence Alignment on High-Performance Systems with CUDA and MPI

TL;DR

The paper tackles the computational bottleneck of global sequence alignment by implementing Needleman-Wunsch on a hybrid CUDA-MPI platform, introducing a per-cell parallel CUDA kernel for single alignments and using MPI to distribute many alignments across a cluster. It employs a center-star heuristic for multiple sequence alignment and uses a per-cell dependency grid with spin-wait synchronization to minimize idle time, achieving substantial speedups over CPU baselines. Strong and weak scaling analyses demonstrate effective GPU and distributed-resource utilization for large-scale alignments, with the overall time for pairwise alignments scaling as . The work highlights the potential of high-performance computing to advance sequence alignment workflows with minimal changes to the core DP algorithm, paving the way for genome-scale analyses on HPC systems.

Abstract

Sequence alignment is a cornerstone of bioinformatics, widely used to identify similarities between DNA, RNA, and protein sequences and studying evolutionary relationships and functional properties. The Needleman-Wunsch algorithm remains a robust and accurate method for global sequence alignment. However, its computational complexity, O(mn), poses significant challenges when processing large-scale datasets or performing multiple sequence alignments. To address these limitations, a hybrid implementation of the Needleman-Wunsch algorithm that leverages CUDA for parallel execution on GPUs and MPI for distributed computation across multiple nodes on a supercomputer is proposed. CUDA efficiently offloads computationally intensive tasks to GPU cores, while MPI enables communication and workload distribution across nodes to handle large-scale alignments. This work details the implementation and performance evaluation of the Needleman-Wunsch algorithm in a massively parallel computing environment. Experimental results demonstrate significant acceleration of the alignment process compared to traditional CPU-based implementations, particularly for large input sizes and multiple sequence alignments. In summary, the combination of CUDA and MPI effectively overcomes the computational bottlenecks inherent to the Needleman-Wunsch algorithm without requiring substantial modifications to the underlying algorithm, highlighting the potential of high-performance computing in advancing sequence alignment workflows.
Paper Structure (13 sections, 2 equations, 5 figures)

This paper contains 13 sections, 2 equations, 5 figures.

Figures (5)

  • Figure 1: An example two-dimensional alignment grid with primary data as described by the Needleman-Wunsch algorithm
  • Figure 2: A continuation of the alignment shown in Figure 1 after applying the scoring algorithm.
  • Figure 3: Strong scaling comparison of a COL1A1 sequence alignment between mouse and human
  • Figure 4: Strong scaling comparison of a THAP11 sequence alignment between mouse and human
  • Figure 5: Weak scaling comparison of execution times as problem size increases