Table of Contents
Fetching ...

CADC: Crossbar-Aware Dendritic Convolution for Efficient In-memory Computing

Shuai Dong, Junyi Yang, Ye Ke, Hongyang Shang, Arindam Basu

TL;DR

This work introduces CADC, a crossbar-aware dendritic convolution method that embeds a nonlinear dendritic function within crossbar computations to drastically prune negative partial sums. By increasing psum sparsity, CADC enables zero-compression and zero-skipping, reducing buffer, transfer, and accumulation overhead, and mitigating ADC quantization noise with minimal accuracy loss. Across CNNs and SNNs, CADC achieves substantial psum reductions and maintains or improves accuracy, while delivering major hardware gains: up to 11×–18× speedup and 1.9×–22.9× energy efficiency on SRAM IMC implementations. The approach is general and extendable to RRAM IMC, offering a scalable path to efficient, high-performance in-memory CNN and SNN acceleration.

Abstract

Convolutional neural networks (CNNs) are computationally intensive and often accelerated using crossbar-based in-memory computing (IMC) architectures. However, large convolutional layers must be partitioned across multiple crossbars, generating numerous partial sums (psums) that require additional buffer, transfer, and accumulation, thus introducing significant system-level overhead. Inspired by dendritic computing principles from neuroscience, we propose crossbar-aware dendritic convolution (CADC), a novel approach that dramatically increases sparsity in psums by embedding a nonlinear dendritic function (zeroing negative values) directly within crossbar computations. Experimental results demonstrate that CADC significantly reduces psums, eliminating 80% in LeNet-5 on MNIST, 54% in ResNet-18 on CIFAR-10, 66% in VGG-16 on CIFAR-100, and up to 88% in spiking neural networks (SNN) on the DVS Gesture dataset. The induced sparsity from CADC provides two key benefits: (1) enabling zero-compression and zero-skipping, thus reducing buffer and transfer overhead by 29.3% and accumulation overhead by 47.9%; (2) minimizing ADC quantization noise accumulation, resulting in small accuracy degradation - only 0.01% for LeNet-5, 0.1% for ResNet-18, 0.5% for VGG-16, and 0.9% for SNN. Compared to vanilla convolution (vConv), CADC exhibits accuracy changes ranging from +0.11% to +0.19% for LeNet-5, -0.04% to -0.27% for ResNet-18, +0.99% to +1.60% for VGG-16, and -0.57% to +1.32% for SNN, across crossbar sizes from 64x64 to 256x256. Ultimately, a SRAM-based IMC implementation of CADC achieves 2.15 TOPS and 40.8 TOPS/W for ResNet-18 (4/2/4b), realizing an 11x-18x speedup and 1.9x-22.9x improvement in energy efficiency compared to existing IMC accelerators.

CADC: Crossbar-Aware Dendritic Convolution for Efficient In-memory Computing

TL;DR

This work introduces CADC, a crossbar-aware dendritic convolution method that embeds a nonlinear dendritic function within crossbar computations to drastically prune negative partial sums. By increasing psum sparsity, CADC enables zero-compression and zero-skipping, reducing buffer, transfer, and accumulation overhead, and mitigating ADC quantization noise with minimal accuracy loss. Across CNNs and SNNs, CADC achieves substantial psum reductions and maintains or improves accuracy, while delivering major hardware gains: up to 11×–18× speedup and 1.9×–22.9× energy efficiency on SRAM IMC implementations. The approach is general and extendable to RRAM IMC, offering a scalable path to efficient, high-performance in-memory CNN and SNN acceleration.

Abstract

Convolutional neural networks (CNNs) are computationally intensive and often accelerated using crossbar-based in-memory computing (IMC) architectures. However, large convolutional layers must be partitioned across multiple crossbars, generating numerous partial sums (psums) that require additional buffer, transfer, and accumulation, thus introducing significant system-level overhead. Inspired by dendritic computing principles from neuroscience, we propose crossbar-aware dendritic convolution (CADC), a novel approach that dramatically increases sparsity in psums by embedding a nonlinear dendritic function (zeroing negative values) directly within crossbar computations. Experimental results demonstrate that CADC significantly reduces psums, eliminating 80% in LeNet-5 on MNIST, 54% in ResNet-18 on CIFAR-10, 66% in VGG-16 on CIFAR-100, and up to 88% in spiking neural networks (SNN) on the DVS Gesture dataset. The induced sparsity from CADC provides two key benefits: (1) enabling zero-compression and zero-skipping, thus reducing buffer and transfer overhead by 29.3% and accumulation overhead by 47.9%; (2) minimizing ADC quantization noise accumulation, resulting in small accuracy degradation - only 0.01% for LeNet-5, 0.1% for ResNet-18, 0.5% for VGG-16, and 0.9% for SNN. Compared to vanilla convolution (vConv), CADC exhibits accuracy changes ranging from +0.11% to +0.19% for LeNet-5, -0.04% to -0.27% for ResNet-18, +0.99% to +1.60% for VGG-16, and -0.57% to +1.32% for SNN, across crossbar sizes from 64x64 to 256x256. Ultimately, a SRAM-based IMC implementation of CADC achieves 2.15 TOPS and 40.8 TOPS/W for ResNet-18 (4/2/4b), realizing an 11x-18x speedup and 1.9x-22.9x improvement in energy efficiency compared to existing IMC accelerators.

Paper Structure

This paper contains 10 sections, 4 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: (a) Energy breakdown of a 65 nm SRAM IMC accelerator running VGG-8 on CIFAR-10, as modeled with NeuroSim peng2020dnn+. (b) Normalized count of psums comparison between vanilla convolution (vConv) and CADC when implementing 6-th convolution layer into different crossbars.
  • Figure 2: Comparison of CADC and vConv in (a) software and (b) hardware
  • Figure 3: Overall macro structure: (a) Hardware structure. (b) Information of twin 9T SRAM bitcell. (c) Single-column circuit diagram with corresponding timing characteristics for MAC and IMA.
  • Figure 4: Comparative training analysis of CADC and vConv using LeNet-5 on MNIST, ResNet-18 on CIFAR-10, VGG-16 on CIFAR-100, and an SNN on the DVS Gesture dataset.
  • Figure 5: Psums sparsity comparison between vConV and CADC across all convolution layer in (a) LeNet-5 on MNIST, (b) Resnet-18 on CIFAR-10, (c) VGG-16 on CIFAR-100 and (d) SNN on DVS Gesture.
  • ...and 5 more figures