Lock-free de Bruijn graph
Daniel Górniak, Robert Nowak
TL;DR
The paper introduces a lock-free de Bruijn/A-Bruijn graph data structure and a parallel graph-building algorithm for de-novo genome assembly from reads. It uses a hash-table of atomic Nodes and a compare-and-swap (CAS) based protocol to insert vertices and edges without locks, enabling scalable multi-threaded construction. The implementation, named LFdBG, includes normalization, contig finding, and a memory-pooling strategy, and is evaluated against Cuttlefish on E. coli, yeast, and human chromosome 1 with metrics from QUAST; results show strong speedups on smaller genomes and good scalability, though higher memory usage. The work provides a practical, MIT-licensed C++ library for lock-free graph assembly, offering a significant advance for parallel genome assembly on multi-core systems and large-scale sequencing data.
Abstract
De Bruijn graph is one of the most important data structures used in de-novo genome assembly algorithms, especially for NGS data. There is a growing need for parallel data structures and algorithms due to the increasing number of cores in modern computers. The assembly task is an indispensable step in sequencing genomes of new organisms and studying structural genomic changes. In recent years, the dynamic development of next-generation sequencing (NGS) methods raises hopes for making whole-genome sequencing a fast and reliable tool used, for example, in medical diagnostics. However, this is hampered by the slowness and computational requirements of the current processing algorithms, which raises the need to develop more efficient algorithms. One possible approach, still little explored, is the use of quantum computing. We created the lock-free version of the de Bruijn graph, as well as a lock-free algorithm to build such graph from reads. Our algorithm and data structures are developed to use parallel threads of execution and do not use mutexes or other locking mechanisms, instead, we used only compare-and-swap instruction and other atomic operations. It makes our algorithm very fast and efficiently scaling. The presented article depicts the new lock-free de Bruijn graph data structure with a graph build algorithm. We developed a C++ library and tested its performance to depict its high speed and scalability compared to other available tools.
