CADC: Crossbar-Aware Dendritic Convolution for Efficient In-memory Computing

Shuai Dong; Junyi Yang; Ye Ke; Hongyang Shang; Arindam Basu

CADC: Crossbar-Aware Dendritic Convolution for Efficient In-memory Computing

Shuai Dong, Junyi Yang, Ye Ke, Hongyang Shang, Arindam Basu

TL;DR

This work introduces CADC, a crossbar-aware dendritic convolution method that embeds a nonlinear dendritic function within crossbar computations to drastically prune negative partial sums. By increasing psum sparsity, CADC enables zero-compression and zero-skipping, reducing buffer, transfer, and accumulation overhead, and mitigating ADC quantization noise with minimal accuracy loss. Across CNNs and SNNs, CADC achieves substantial psum reductions and maintains or improves accuracy, while delivering major hardware gains: up to 11×–18× speedup and 1.9×–22.9× energy efficiency on SRAM IMC implementations. The approach is general and extendable to RRAM IMC, offering a scalable path to efficient, high-performance in-memory CNN and SNN acceleration.

Abstract

Convolutional neural networks (CNNs) are computationally intensive and often accelerated using crossbar-based in-memory computing (IMC) architectures. However, large convolutional layers must be partitioned across multiple crossbars, generating numerous partial sums (psums) that require additional buffer, transfer, and accumulation, thus introducing significant system-level overhead. Inspired by dendritic computing principles from neuroscience, we propose crossbar-aware dendritic convolution (CADC), a novel approach that dramatically increases sparsity in psums by embedding a nonlinear dendritic function (zeroing negative values) directly within crossbar computations. Experimental results demonstrate that CADC significantly reduces psums, eliminating 80% in LeNet-5 on MNIST, 54% in ResNet-18 on CIFAR-10, 66% in VGG-16 on CIFAR-100, and up to 88% in spiking neural networks (SNN) on the DVS Gesture dataset. The induced sparsity from CADC provides two key benefits: (1) enabling zero-compression and zero-skipping, thus reducing buffer and transfer overhead by 29.3% and accumulation overhead by 47.9%; (2) minimizing ADC quantization noise accumulation, resulting in small accuracy degradation - only 0.01% for LeNet-5, 0.1% for ResNet-18, 0.5% for VGG-16, and 0.9% for SNN. Compared to vanilla convolution (vConv), CADC exhibits accuracy changes ranging from +0.11% to +0.19% for LeNet-5, -0.04% to -0.27% for ResNet-18, +0.99% to +1.60% for VGG-16, and -0.57% to +1.32% for SNN, across crossbar sizes from 64x64 to 256x256. Ultimately, a SRAM-based IMC implementation of CADC achieves 2.15 TOPS and 40.8 TOPS/W for ResNet-18 (4/2/4b), realizing an 11x-18x speedup and 1.9x-22.9x improvement in energy efficiency compared to existing IMC accelerators.

CADC: Crossbar-Aware Dendritic Convolution for Efficient In-memory Computing

TL;DR

Abstract

CADC: Crossbar-Aware Dendritic Convolution for Efficient In-memory Computing

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)