Optimizing Sequence Alignment with Scored NFAs
Ryan Karbowniczak, Rasha Karakchi
TL;DR
The paper tackles the challenge of identifying the best-scoring sequence-alignment path within nondeterministic finite automata by augmenting the NAPOLY FPGA accelerator to a scored variant called NAPOLY+. NAPOLY+ introduces per-STE registers and an arithmetic unit (STE+) to accumulate and compare path scores, enabling additive scoring and best-path selection in hardware. The approach leverages weighted finite automata concepts to compute the maximum scoring path and reports on design viability and performance through experiments on Zynq UltraScale+ devices across 1K–64K array sizes, showing higher functional capability and throughput relative to the original NAPOLY, with memory usage growing with array size. The work demonstrates the practical potential of score-aware automata in sequence alignment and suggests avenues for future optimization (interconnects, memory hierarchies) and extensions to other domains like machine learning and graph processing on newer FPGA platforms.
Abstract
The rapid increase in symbolic data has underscored the significance of pattern matching and regular expression processing. While nondeterministic finite automata (NFA) are commonly used for these tasks, they are limited to detecting matches without determining the optimal one. This research expands on the NAPOLY pattern-matching accelerator by introducing NAPOLY+, which adds registers to each processing element to store variables like scores, weights, or edge costs. This enhancement allows NAPOLY+ to identify the highest score corresponding to the best match in sequence alignment tasks through the new-added arithmetic unit in each processor element. The design was evaluated against the original NAPOLY, with results showing that NAPOLY+ offers superior functionality and improved performance in identifying the best match. The design was implemented and tested on zynq102 and zynq104 FPGA devices, with performance metrics compared across array sizes from 1K to 64K processing elements. The results showed that memory usage increased proportionally with array size with Fmax decreasing as the array size grew on both platforms. The reported findings focus specifically on the core array, excluding the impact of buffers and DRAMs.
