Table of Contents
Fetching ...

PACiM: A Sparsity-Centric Hybrid Compute-in-Memory Architecture via Probabilistic Approximation

Wenlun Zhang, Shimpei Ando, Yung-Chin Chen, Satomi Miyagi, Shinya Takamaeda-Yamazaki, Kentaro Yoshioka

TL;DR

PACiM introduces a sparsity-centric Compute-in-Memory architecture powered by Probabilistic Approximate Computation to transform MAC operations into simple scalar, sparsity-driven computations. By encoding weights and activations as bit-level sparsity and eliminating LSB data transfers, PACiM reduces memory accesses and data movement while preserving accuracy across CIFAR-10, CIFAR-100, and ImageNet with ResNet-18. The architecture combines D-CiM banks for MSB determinism with a PAC-based Compute Engine and on-die sparsity encoder, achieving up to 14.63 TOPS/W (8b/8b) in 65 nm and reducing bit-serial cycles by about 81%. Dynamic workload configuration further lowers energy by leveraging SPEC-based saliency to move less critical computations into the sparsity domain, delivering substantial system-level efficiency gains with minimal accuracy loss.

Abstract

Approximate computing emerges as a promising approach to enhance the efficiency of compute-in-memory (CiM) systems in deep neural network processing. However, traditional approximate techniques often significantly trade off accuracy for power efficiency, and fail to reduce data transfer between main memory and CiM banks, which dominates power consumption. This paper introduces a novel probabilistic approximate computation (PAC) method that leverages statistical techniques to approximate multiply-and-accumulation (MAC) operations, reducing approximation error by 4X compared to existing approaches. PAC enables efficient sparsity-based computation in CiM systems by simplifying complex MAC vector computations into scalar calculations. Moreover, PAC enables sparsity encoding and eliminates the LSB activations transmission, significantly reducing data reads and writes. This sets PAC apart from traditional approximate computing techniques, minimizing not only computation power but also memory accesses by 50%, thereby boosting system-level efficiency. We developed PACiM, a sparsity-centric architecture that fully exploits sparsity to reduce bit-serial cycles by 81% and achieves a peak 8b/8b efficiency of 14.63 TOPS/W in 65 nm CMOS while maintaining high accuracy of 93.85/72.36/66.02% on CIFAR-10/CIFAR-100/ImageNet benchmarks using a ResNet-18 model, demonstrating the effectiveness of our PAC methodology.

PACiM: A Sparsity-Centric Hybrid Compute-in-Memory Architecture via Probabilistic Approximation

TL;DR

PACiM introduces a sparsity-centric Compute-in-Memory architecture powered by Probabilistic Approximate Computation to transform MAC operations into simple scalar, sparsity-driven computations. By encoding weights and activations as bit-level sparsity and eliminating LSB data transfers, PACiM reduces memory accesses and data movement while preserving accuracy across CIFAR-10, CIFAR-100, and ImageNet with ResNet-18. The architecture combines D-CiM banks for MSB determinism with a PAC-based Compute Engine and on-die sparsity encoder, achieving up to 14.63 TOPS/W (8b/8b) in 65 nm and reducing bit-serial cycles by about 81%. Dynamic workload configuration further lowers energy by leveraging SPEC-based saliency to move less critical computations into the sparsity domain, delivering substantial system-level efficiency gains with minimal accuracy loss.

Abstract

Approximate computing emerges as a promising approach to enhance the efficiency of compute-in-memory (CiM) systems in deep neural network processing. However, traditional approximate techniques often significantly trade off accuracy for power efficiency, and fail to reduce data transfer between main memory and CiM banks, which dominates power consumption. This paper introduces a novel probabilistic approximate computation (PAC) method that leverages statistical techniques to approximate multiply-and-accumulation (MAC) operations, reducing approximation error by 4X compared to existing approaches. PAC enables efficient sparsity-based computation in CiM systems by simplifying complex MAC vector computations into scalar calculations. Moreover, PAC enables sparsity encoding and eliminates the LSB activations transmission, significantly reducing data reads and writes. This sets PAC apart from traditional approximate computing techniques, minimizing not only computation power but also memory accesses by 50%, thereby boosting system-level efficiency. We developed PACiM, a sparsity-centric architecture that fully exploits sparsity to reduce bit-serial cycles by 81% and achieves a peak 8b/8b efficiency of 14.63 TOPS/W in 65 nm CMOS while maintaining high accuracy of 93.85/72.36/66.02% on CIFAR-10/CIFAR-100/ImageNet benchmarks using a ResNet-18 model, demonstrating the effectiveness of our PAC methodology.
Paper Structure (20 sections, 5 equations, 7 figures, 4 tables)

This paper contains 20 sections, 5 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: General Concept of PACiM: This architecture utilizes a novel PAC method to transform traditional vector computations into a single scalar operation. It features data compression into bit-level sparsity, while eliminating all LSB data transfers.
  • Figure 2: Probabilistic Approximate Computation: Model CiM operations using principles of statistical inference.
  • Figure 3: Approximate Error Analysis: (a) Weight/activation sparsity dependence across each bit index. (b) Distribution of actual MAC operations for typical weight/activation sparsity combinations. (c) RMSE variation across different DP lengths.
  • Figure 4: Computing map of the PACiM architecture.
  • Figure 5: PACiM Architecture: D-CiM banks for deterministic computing and CnM processing unit for PAC processing.
  • ...and 2 more figures