Table of Contents
Fetching ...

CiMBA: Accelerating Genome Sequencing through On-Device Basecalling via Compute-in-Memory

William Andrew Simon, Irem Boybat, Riselda Kodra, Elena Ferro, Gagandeep Singh, Mohammed Alser, Shubham Jain, Hsinyu Tsai, Geoffrey W. Burr, Onur Mutlu, Abu Sebastian

TL;DR

The paper tackles the bottleneck of basecalling in genome sequencing by introducing CiMBA, an embedded Compute-in-Memory accelerator that co-designs hardware with analog-aware DNNs (AL-Dorado) to enable real-time, on-device basecalling. It combines 11 PCM CiM tiles, a 2D mesh interconnect, a Digital Processing Unit, LookAround decoding, and a signal buffer to sustain high-throughput basecalling with low energy and area, significantly reducing data movement. Through analog-aware training and drift mitigation, AL-Dorado achieves near-SotA accuracy while delivering $4.77\times 10^6$ bases/s throughput and strong energy/area efficiency ($17\times$ and $27\times$ improvements, respectively) over prior embedded accelerators, and reduces communication overhead by about $43.7\times$. This approach enables streaming, on-device sequencing workflows, including metagenomics, with substantial implications for portable genomics and real-time analysis at the edge.

Abstract

As genome sequencing is finding utility in a wide variety of domains beyond the confines of traditional medical settings, its computational pipeline faces two significant challenges. First, the creation of up to 0.5 GB of data per minute imposes substantial communication and storage overheads. Second, the sequencing pipeline is bottlenecked at the basecalling step, consuming >40% of genome analysis time. A range of proposals have attempted to address these challenges, with limited success. We propose to address these challenges with a Compute-in-Memory Basecalling Accelerator (CiMBA), the first embedded ($\sim25$mm$^2$) accelerator capable of real-time, on-device basecalling, coupled with AnaLog (AL)-Dorado, a new family of analog focused basecalling DNNs. Our resulting hardware/software co-design greatly reduces data communication overhead, is capable of a throughput of 4.77 million bases per second, 24x that required for real-time operation, and achieves 17x/27x power/area efficiency over the best prior basecalling embedded accelerator while maintaining a high accuracy comparable to state-of-the-art software basecallers.

CiMBA: Accelerating Genome Sequencing through On-Device Basecalling via Compute-in-Memory

TL;DR

The paper tackles the bottleneck of basecalling in genome sequencing by introducing CiMBA, an embedded Compute-in-Memory accelerator that co-designs hardware with analog-aware DNNs (AL-Dorado) to enable real-time, on-device basecalling. It combines 11 PCM CiM tiles, a 2D mesh interconnect, a Digital Processing Unit, LookAround decoding, and a signal buffer to sustain high-throughput basecalling with low energy and area, significantly reducing data movement. Through analog-aware training and drift mitigation, AL-Dorado achieves near-SotA accuracy while delivering bases/s throughput and strong energy/area efficiency ( and improvements, respectively) over prior embedded accelerators, and reduces communication overhead by about . This approach enables streaming, on-device sequencing workflows, including metagenomics, with substantial implications for portable genomics and real-time analysis at the edge.

Abstract

As genome sequencing is finding utility in a wide variety of domains beyond the confines of traditional medical settings, its computational pipeline faces two significant challenges. First, the creation of up to 0.5 GB of data per minute imposes substantial communication and storage overheads. Second, the sequencing pipeline is bottlenecked at the basecalling step, consuming >40% of genome analysis time. A range of proposals have attempted to address these challenges, with limited success. We propose to address these challenges with a Compute-in-Memory Basecalling Accelerator (CiMBA), the first embedded (mm) accelerator capable of real-time, on-device basecalling, coupled with AnaLog (AL)-Dorado, a new family of analog focused basecalling DNNs. Our resulting hardware/software co-design greatly reduces data communication overhead, is capable of a throughput of 4.77 million bases per second, 24x that required for real-time operation, and achieves 17x/27x power/area efficiency over the best prior basecalling embedded accelerator while maintaining a high accuracy comparable to state-of-the-art software basecallers.

Paper Structure

This paper contains 37 sections, 16 figures, 3 tables.

Figures (16)

  • Figure 1: Size comparison of (a) the MinION Mk1C device that features MinION sequencing device and TX2 embedded GPU, and (b) the standalone MinION device along with our proposed CiMBA basecalling processor.
  • Figure 2: The MinION produces $\sim$0.5 GB of raw signal data/minute to be streamed to workstation for basecalling, which then incurs 40% (NVIDIA Xavier AGX) to 86% (Xeon W-10885M) of the sequencing pipeline due to large parameter counts/DRAM access costs common to LSTM DNNs 9563028.
  • Figure 3: CRF-CTC decoding (state length=1 for simplicity): DNN outputs represent log-likelihoods of state-transitions (❶). Path likelihood is accumulated across timesteps (❷/❹), and gradient of the final value w.r.t. the inputs (❸) is iteratively evaluated to identify each timestep's most likely transition (❺).
  • Figure 4: (a) Mapping a DNN layer to (b) an array of eNVM cells (c) enables massively parallel MAC operations.
  • Figure 5: Mapping of AL-Dorado (Figure \ref{['fig:dorado']}) on the CiMBA architecture.
  • ...and 11 more figures