Table of Contents
Fetching ...

Accelerating DNA Read Mapping with Digital Processing-in-Memory

Rotem Ben-Hur, Orian Leitersdorf, Ronny Ronen, Lidor Goldshmidt, Idan Magram, Lior Kaplun, Leonid Yavitz, Shahar Kvatinsky

TL;DR

DART-PIM facilitates digital processing-in-memory (PIM) for an end-to-end acceleration of the entire read-mapping process, from indexing using a unique data organization schema to filtering and read alignment with an optimized Wagner Fischer algorithm.

Abstract

Genome analysis has revolutionized fields such as personalized medicine and forensics. Modern sequencing machines generate vast amounts of fragmented strings of genome data called reads. The alignment of these reads into a complete DNA sequence of an organism (the read mapping process) requires extensive data transfer between processing units and memory, leading to execution bottlenecks. Prior studies have primarily focused on accelerating specific stages of the read-mapping task. Conversely, this paper introduces a holistic framework called DART-PIM that accelerates the entire read-mapping process. DART-PIM facilitates digital processing-in-memory (PIM) for an end-to-end acceleration of the entire read-mapping process, from indexing using a unique data organization schema to filtering and read alignment with an optimized Wagner Fischer algorithm. A comprehensive performance evaluation with real genomic data shows that DART-PIM achieves a 5.7x and 257x improvement in throughput and a 92x and 27x energy efficiency enhancement compared to state-of-the-art GPU and PIM implementations, respectively.

Accelerating DNA Read Mapping with Digital Processing-in-Memory

TL;DR

DART-PIM facilitates digital processing-in-memory (PIM) for an end-to-end acceleration of the entire read-mapping process, from indexing using a unique data organization schema to filtering and read alignment with an optimized Wagner Fischer algorithm.

Abstract

Genome analysis has revolutionized fields such as personalized medicine and forensics. Modern sequencing machines generate vast amounts of fragmented strings of genome data called reads. The alignment of these reads into a complete DNA sequence of an organism (the read mapping process) requires extensive data transfer between processing units and memory, leading to execution bottlenecks. Prior studies have primarily focused on accelerating specific stages of the read-mapping task. Conversely, this paper introduces a holistic framework called DART-PIM that accelerates the entire read-mapping process. DART-PIM facilitates digital processing-in-memory (PIM) for an end-to-end acceleration of the entire read-mapping process, from indexing using a unique data organization schema to filtering and read alignment with an optimized Wagner Fischer algorithm. A comprehensive performance evaluation with real genomic data shows that DART-PIM achieves a 5.7x and 257x improvement in throughput and a 92x and 27x energy efficiency enhancement compared to state-of-the-art GPU and PIM implementations, respectively.

Paper Structure

This paper contains 27 sections, 5 equations, 10 figures, 6 tables, 2 algorithms.

Figures (10)

  • Figure 1: Genome sequence alignment process. The genomic samples are input into a sequencing machine that fragments them into small segments. The sequencing machine generates short strings known as reads, which are then processed during the read-mapping procedure. Read mapping involves an offline indexing stage followed by the online seeding, pre-alignment filtering, and read alignment stages.
  • Figure 2: Area allocation within a memory crossbar array: each row conducts a series of logical operations, starting from $L$-bits inputs $I_x$ (blue) and $J_x$ (yellow), using intermediate results (orange), to obtain the final output stored in the memory (green). All $n$ rows perform the same logical operations (over different inputs) concurrently. $WL_x$ and $BL_x$ are, respectively, wordline (row) and bitline (column) number x.
  • Figure 3: Mapping of a linear WF matrix calculation into a single crossbar row for $eth=6$. The reference segment (blue) and read (yellow) are the computation inputs. Only $2eth+1$ WF distances are needed at any point (green). To compute the current WF distance, only the distances in adjacent cells (storing the top and left WF matrix distances) and the previous value of the current cell (storing the top-left WF matrix distance) are required. The intermediate results generated while computing the distances are stored in temporary row cells (orange), and due to limited number of cells, are re-used when necessary. The total number of bits in the row is 1024.
  • Figure 4: Example of the mapping of a banded WF matrix (with $eth=6$) onto the WF distances buffer in a single crossbar row (contained $2eth+1$ cells). The computed cell is denoted by light green "$B_3$" (location $(4,4)$). The remaining green cells are stored within the WF distances buffer during computation. The computation is performed using only top, top-left, and left cells (marked with orange arrows). Upon completion, the computed value replaces the target cell's value in the WF distances buffer. Note that gray cells are not computed as they represent distances larger than $eth$.
  • Figure 5: DART-PIM architecture featuring RISC-V cores and computing memristive memories. The memory consists of a single PIM module, which contains RISC-V cores with private L1 cache memories integrated within the memristive memory chips. Each chip is divided into banks, each equipped with a dedicated controller. Each crossbar in the bank is responsible for a single reference minimizer.
  • ...and 5 more figures