GeneTEK: Low-power, high-performance and scalable FPGA architecture for exact unit-cost edit distance
Elena Espinosa, Rubén Rodríguez Álvarez, José Miranda, Rafael Larrosa, Miguel Peón-Quirós, Oscar Plata, David Atienza
Abstract
The advent of next-generation sequencing (NGS) has revolutionized genomic research by enabling cost-effective, high-throughput sequencing of a diverse range of organisms. This breakthrough has unleashed a "Cambrian explosion" in genomic data volume and diversity. This volume of workloads places genomics among the top four big data challenges anticipated for this decade. In this context, pairwise sequence alignment represents a very time- and energy-intensive step in common bioinformatics pipelines. Speeding up this step requires the implementation of heuristic approaches, optimized algorithms, and/or hardware acceleration. Although state-of-the-art CPU and GPU implementations have demonstrated significant performance gains, recent FPGA implementations have shown improved energy efficiency. However, the latter often suffer from limited read-length scalability due to constraints on hardware resources when aligning longer sequences. In this work, we present a flexible FPGA-based accelerator template scalable up to 1000 bp that implements Myers's algorithm to compute exact unit-cost edit-distance using high-level synthesis and a worker-based architecture. GeneTEK, a set of instances of this accelerator template in a Xilinx Zynq UltraScale+ FPGA, achieves up to 113% increase in execution speed and up to 111x reduction in energy consumption compared to leading CPU and GPU solutions, while fitting comparison matrices up to 13x larger than previous FPGA-based systolic-array solutions. By following a SW-HW co-design approach, GeneTEK exploits parallelization at multiple levels and efficient memory use to deliver a scalable and accurate FPGA-based accelerator. These results reaffirm the potential of FPGAs as an energy-efficient platform for pairwise alignment of read-lengths up to 1000 bp.
