Processing-in-memory for genomics workloads
William Andrew Simon, Leonid Yavits, Konstantina Koliogeorgi, Yann Falevoz, Yoshihiro Shibuya, Dominique Lavenier, Irem Boybat, Klea Zambaku, Berkan Şahin, Mohammad Sadrosadati, Onur Mutlu, Abu Sebastian, Rayan Chikhi, The BioPIM Consortium, Can Alkan
TL;DR
This paper presents BioPIM's Processing-in-Memory and Processing-Using-Memory initiatives to tackle the data movement bottlenecks of large-scale genomics. It demonstrates PnM acceleration for alignment, read mapping, k-mer indexing, and variant calling on UPMEM platforms, with substantial speedups and energy reductions, alongside PuM-based basecalling and on-chip pathogen classification using memory-centric architectures like CiMBA and SAS-CAM. The results indicate that memory-centered computing can dramatically reduce data transfer needs while maintaining accuracy, enabling real-time, field-deployable genomic workflows. The authors also discuss the need for domain-specific APIs and software layers to ease adoption across diverse hardware platforms and future generations of PIM technologies.
Abstract
Low-cost, high-throughput DNA and RNA sequencing (HTS) data is the backbone of the life sciences. Genome sequencing is now becoming a part of Predictive, Preventive, Personalized, and Participatory (termed 'P4') medicine. All genomic data are currently processed in energy-hungry computer clusters and centers, necessitating data transfer, consuming substantial energy, and wasting valuable time. Therefore, there is a need for fast, energy-efficient, and cost-efficient technologies that enable genomics research without requiring data centers and cloud platforms. We recently launched the BioPIM Project to leverage emerging processing-in-memory (PIM) technologies to enable energy- and cost-efficient analysis of bioinformatics workloads. The BioPIM Project focuses on co-designing algorithms and data structures commonly used in genomics with several PIM architectures to achieve the highest cost, energy, and time savings.
