ABI: A tightly integrated, unified, sparsity-aware, reconfigurable, compute near-register file/cache GPU architecture with light-weight softmax for deep learning, linear algebra, and Ising compute
Siddhartha Raman Sundara Raman, Jaydeep P. Kulkarni
TL;DR
The paper proposes ABI, a tightly integrated, sparsity-aware, reconfigurable near-memory GPU architecture that closes the memory-ALU data movement bottleneck across CNNs, GCNs, LP, Ising, and LLM workloads. It combines a near-register file and near-memory logic with a 5-stage Reconfigurable Compute Engine and lightweight near-memory softmax, plus adaptive sparsity circuitry and dynamic resolution up to INT16. The design achieves 6–16x speedups and 6–13x energy savings over a baseline MIAOW GPU, with additional gains from sparsity-aware and softmax accelerators (1.5x and 1.6x energy savings, respectively), reaching about 370 GOPS/W at 250 MHz. Test-chip measurements on a unified architecture across multiple workloads demonstrate strong energy efficiency and performance benefits, suggesting practical impact for near-memory computing in future GPUs and accelerators.
Abstract
We present a tightly integrated and unified near-memory GPU architecture that delivers 6 to 16 times speedup and 6 to 13 times energy savings across Convolutional Neural Networks, Graph Convolutional Networks, Linear Programming, Large Language Models, and Ising workloads compared to MIAOW GPU. The design includes a custom sparsity-aware near-memory circuit providing about 1.5 times energy savings, and a lightweight softmax circuit providing about 1.6 times energy savings. The architecture supports reconfigurable compute up to INT16 with dynamic resolution updates and scales efficiently across problem sizes. ABI-enabled MI300 and Blackwell systems achieve about 4.5 times speedup over baseline MI300 and Blackwell.
