GenDRAM:Hardware-Software Co-Design of General Platform in DRAM

Tsung-Han Lu; Weihong Xu; Tajana Rosing

GenDRAM:Hardware-Software Co-Design of General Platform in DRAM

Tsung-Han Lu, Weihong Xu, Tajana Rosing

TL;DR

GenDRAM is proposed, a massively parallel PIM accelerator that leverages the immense capacity and internal bandwidth of monolithic 3D DRAM(M3D DRAM) to integrate entire data-intensive pipelines, such as the full genomics workflow from seeding to alignment, onto a single heterogeneous chip.

Abstract

Dynamic programming (DP) algorithms, such as All-Pairs Shortest Path (APSP) and genomic sequence alignment, are fundamental to many scientific domains but are severely bottlenecked by data movement on conventional architectures. While Processing-in-Memory (PIM) offers a promising solution, existing accelerators often address only a fraction of the work-flow, creating new system-level bottlenecks in host-accelerator communication and off-chip data streaming. In this work, we propose GenDRAM, a massively parallel PIM accelerator that overcomes these limitations. GenDRAM leverages the immense capacity and internal bandwidth of monolithic 3D DRAM(M3D DRAM) to integrate entire data-intensive pipelines, such as the full genomics workflow from seeding to alignment, onto a single heterogeneous chip. At its core is a novel architecture featuring specialized Search PUs for memory-intensive tasks and universal, multiplier-less Compute PUs for diverse DP calculations. This is enabled by a 3D-aware data mapping strategy that exploits the tiered latency of M3D DRAM for performance optimization. Through comprehensive simulation, we demonstrate that GenDRAM achieves a transformative performance leap, outperforming state-of-the-art GPU systems by over 68x on APSP and over 22x on the end-to-end genomics pipeline.

GenDRAM:Hardware-Software Co-Design of General Platform in DRAM

TL;DR

Abstract

Paper Structure (45 sections, 2 equations, 22 figures, 2 tables, 1 algorithm)

This paper contains 45 sections, 2 equations, 22 figures, 2 tables, 1 algorithm.

Introduction
Background
APSP and Genomic Sequence Alignment
APSP and FW algorithm
Genomic Alignment
Unified View of Dynamic Programming Workloads
Computational isomorphism
Structural commonality
Monolithic 3D-Stackable DRAM
Challenges on M3D DRAM for Dynamic Programming
GenDRAM Hardware Architecture
GenDRAM Overview
M3D DRAM-Logic Co-Optimization
Bandwidth-Matched PU Scaling
Hybrid Bonding Interface
...and 30 more sections

Figures (22)

Figure 1: Example of a weighted directed graph (right) and its corresponding adjacency matrix (left), serving as the input for the APSP problem
Figure 2: Three phases of the Blocked FW algorithm for updating a distance matrix. Orange arrows indicate data dependencies or computation within a block or between blocks and pivot elements
Figure 3: Overview of the genomic sequence alignment pipeline (left) and illustration of basic operations in sequence alignment (right): match, deletion, insertion, and mismatch. These operations are associated with different scores or penalties in dynamic programming algorithms
Figure 4: Illustration of three variants of DP alignment algorithms. (a) Original full DP. (b) Banded difference-based DP. (c) Adaptive banded parallelized DP xu2023rapidx
Figure 5: The GenDRAM Unified Computing Abstraction. Despite differing data access patterns (Global Broadcast in APSP vs. Local Wavefront in Genomics), both workloads map to a unified Generalized Grid Update problem based on semi-ring algebra
...and 17 more figures

GenDRAM:Hardware-Software Co-Design of General Platform in DRAM

TL;DR

Abstract

GenDRAM:Hardware-Software Co-Design of General Platform in DRAM

Authors

TL;DR

Abstract

Table of Contents

Figures (22)