DAISM: Digital Approximate In-SRAM Multiplier-based Accelerator for DNN Training and Inference
Lorenzo Sonnino, Shaswot Shresthamali, Yuan He, Masaaki Kondo
TL;DR
This work targets the data-movement bottleneck in DNN GEMMs by introducing a digital in-SRAM approximate multiplier and the DAISM accelerator. The multiplier uses bit-parallel full-line activation to perform in-memory multiplication as a wired-OR of partial products, avoiding complex adder trees, and can be augmented with pre-computed partial-sum values to recover accuracy. The DAISM architecture leverages this multiplier, exploring FP mantissa processing with bf16 and PC2/PC3 variants, including truncation modes to trade accuracy for energy and throughput. Across accuracy, energy, and architectural metrics, DAISM demonstrates substantially higher area efficiency than state-of-the-art SRAM-based PIM solutions, with competitive energy efficiency and robust performance when scaling banked SRAM configurations, making it practical for edge and near-edge DNN workloads.
Abstract
DNNs are widely used but face significant computational costs due to matrix multiplications, especially from data movement between the memory and processing units. One promising approach is therefore Processing-in-Memory as it greatly reduces this overhead. However, most PIM solutions rely either on novel memory technologies that have yet to mature or bit-serial computations that have significant performance overhead and scalability issues. Our work proposes an in-SRAM digital multiplier, that uses a conventional memory to perform bit-parallel computations, leveraging multiple wordlines activation. We then introduce DAISM, an architecture leveraging this multiplier, which achieves up to two orders of magnitude higher area efficiency compared to the SOTA counterparts, with competitive energy efficiency.
