Accelerating Seed Location Filtering in DNA Read Mapping Using a Commercial Compute-in-SRAM Architecture
Courtney Golden, Dan Ilan, Nicholas Cebry, Christopher Batten
TL;DR
The paper tackles speeding up the filtering stage of reference-guided DNA read mapping by offloading Myers' bit-parallel edit-distance calculation to a commercial compute-in-SRAM accelerator (Gemini APU). It provides a microcode-level architectural and programming model for the Gemini APU and demonstrates how the Myers' algorithm can be mapped to this massively parallel, bit-sliced hardware to process thousands of candidate alignments in parallel. The results show substantial end-to-end speedups (average $14.1\times$, up to $24.1\times$ for some queries) and identify that kernel computation and data movement dominate runtime, with clear scalability as candidate counts grow. The work suggests compute-in-SRAM is well-suited for DNA read filtering and could influence future accelerator designs and genomics pipelines, while outlining opportunities for multicore expansion and longer-read scenarios.
Abstract
DNA sequence alignment is an important workload in computational genomics. Reference-guided DNA assembly involves aligning many read sequences against candidate locations in a long reference genome. To reduce the computational load of this alignment, candidate locations can be pre-filtered using simpler alignment algorithms like edit distance. Prior work has explored accelerating filtering on simulated compute-in-DRAM, due to the massive parallelism of compute-in-memory architectures. In this paper, we present work-in-progress on accelerating filtering using a commercial compute-in-SRAM accelerator. We leverage the recently released Gemini accelerator platform from GSI Technology, which is the first, to our knowledge, commercial-scale compute-in-SRAM system. We accelerate the Myers' bit-parallel edit distance algorithm, producing average speedups of 14.1x over single-core CPU performance. Individual query/candidate alignments produce speedups of up to 24.1x. These early results suggest this novel architecture is well-suited to accelerating the filtering step of sequence-to-sequence DNA alignment.
