Table of Contents
Fetching ...

BasisN: Reprogramming-Free RRAM-Based In-Memory-Computing by Basis Combination for Deep Neural Networks

Amro Eldebiky, Grace Li Zhang, Xunzhao Yin, Cheng Zhuo, Ing-Chao Lin, Ulf Schlichtmann, Bing Li

TL;DR

BasisN tackles the crossbar reprogramming bottleneck in RRAM-based IMC by representing all DNN kernels as linear combinations of a fixed global basis set written once, with layer-specific, quantized coefficients. The framework introduces a hardware-friendly kernel decomposition, a multibit time-multiplexing scheme, and an alternating training process to jointly optimize the basis and coefficients, plus contest-aware regularization to enable parallelization. Experiments on DenseNet and ResNet demonstrate near-baseline accuracy with inference cycles and energy-delay product reduced to well below the reprogramming-based approaches, validating practicality for large-scale networks on limited crossbar hardware. The approach enables deployment of large DNNs on existing or modest RRAM chips by removing reprogramming overhead while maintaining performance and incurring negligible hardware cost.

Abstract

Deep neural networks (DNNs) have made breakthroughs in various fields including image recognition and language processing. DNNs execute hundreds of millions of multiply-and-accumulate (MAC) operations. To efficiently accelerate such computations, analog in-memory-computing platforms have emerged leveraging emerging devices such as resistive RAM (RRAM). However, such accelerators face the hurdle of being required to have sufficient on-chip crossbars to hold all the weights of a DNN. Otherwise, RRAM cells in the crossbars need to be reprogramed to process further layers, which causes huge time/energy overhead due to the extremely slow writing and verification of the RRAM cells. As a result, it is still not possible to deploy such accelerators to process large-scale DNNs in industry. To address this problem, we propose the BasisN framework to accelerate DNNs on any number of available crossbars without reprogramming. BasisN introduces a novel representation of the kernels in DNN layers as combinations of global basis vectors shared between all layers with quantized coefficients. These basis vectors are written to crossbars only once and used for the computations of all layers with marginal hardware modification. BasisN also provides a novel training approach to enhance computation parallelization with the global basis vectors and optimize the coefficients to construct the kernels. Experimental results demonstrate that cycles per inference and energy-delay product were reduced to below 1% compared with applying reprogramming on crossbars in processing large-scale DNNs such as DenseNet and ResNet on ImageNet and CIFAR100 datasets, while the training and hardware costs are negligible.

BasisN: Reprogramming-Free RRAM-Based In-Memory-Computing by Basis Combination for Deep Neural Networks

TL;DR

BasisN tackles the crossbar reprogramming bottleneck in RRAM-based IMC by representing all DNN kernels as linear combinations of a fixed global basis set written once, with layer-specific, quantized coefficients. The framework introduces a hardware-friendly kernel decomposition, a multibit time-multiplexing scheme, and an alternating training process to jointly optimize the basis and coefficients, plus contest-aware regularization to enable parallelization. Experiments on DenseNet and ResNet demonstrate near-baseline accuracy with inference cycles and energy-delay product reduced to well below the reprogramming-based approaches, validating practicality for large-scale networks on limited crossbar hardware. The approach enables deployment of large DNNs on existing or modest RRAM chips by removing reprogramming overhead while maintaining performance and incurring negligible hardware cost.

Abstract

Deep neural networks (DNNs) have made breakthroughs in various fields including image recognition and language processing. DNNs execute hundreds of millions of multiply-and-accumulate (MAC) operations. To efficiently accelerate such computations, analog in-memory-computing platforms have emerged leveraging emerging devices such as resistive RAM (RRAM). However, such accelerators face the hurdle of being required to have sufficient on-chip crossbars to hold all the weights of a DNN. Otherwise, RRAM cells in the crossbars need to be reprogramed to process further layers, which causes huge time/energy overhead due to the extremely slow writing and verification of the RRAM cells. As a result, it is still not possible to deploy such accelerators to process large-scale DNNs in industry. To address this problem, we propose the BasisN framework to accelerate DNNs on any number of available crossbars without reprogramming. BasisN introduces a novel representation of the kernels in DNN layers as combinations of global basis vectors shared between all layers with quantized coefficients. These basis vectors are written to crossbars only once and used for the computations of all layers with marginal hardware modification. BasisN also provides a novel training approach to enhance computation parallelization with the global basis vectors and optimize the coefficients to construct the kernels. Experimental results demonstrate that cycles per inference and energy-delay product were reduced to below 1% compared with applying reprogramming on crossbars in processing large-scale DNNs such as DenseNet and ResNet on ImageNet and CIFAR100 datasets, while the training and hardware costs are negligible.
Paper Structure (13 sections, 9 figures, 1 table, 1 algorithm)

This paper contains 13 sections, 9 figures, 1 table, 1 algorithm.

Figures (9)

  • Figure 1: (a) The structure of an RRAM crossbar. (b) The structure of an RRAM cell.
  • Figure 2: a) Performance slowdown due to reprogramming when several benchmarks are deployed on 48 RRAM crossbars of a size $256\times 256$, with row-based reprogramming merced2016repeatable and block-based reprogramming chen2023novel. b) The compression ratios achieved by EPIM wang2023epim and PIM_prune chu2020pim for DenseNet-ImageNet benchmark and the required compression ratio to avoid reprogramming.
  • Figure 3: BasisN representation of the weights of a convolutional layer. a) The kernels of the layer, reshaping of the kernels as 2D weight matrix and partitioning into $d \times d$ submatrices fitting into the crossbars. b) The representation of a kernel partition as a combination of the basis vectors. c) The implementation of the BasisN representation on the crossbar hardware.
  • Figure 4: Basis contest between kernels and how it affects parallelization.
  • Figure 5: Inference accuracy with respect to the bitwidth of the control coefficient and crossbar size for a) DenseNet-ImageNet b) DenseNet-CIFAR100, and c) ResNet-CIFAR100.
  • ...and 4 more figures