Table of Contents
Fetching ...

HURRY: Highly Utilized, Reconfigurable ReRAM-based In-situ Accelerator with Multifunctionality

Hery Shin, Jae-Young Kim, Donghyuk Kim, Joo-Young Kim

TL;DR

Resistive RAM-based in-situ accelerators for CNNs suffer from spatial and temporal underutilization, driven by suboptimal array sizing and excessive data movement. HURRY introduces reconfigurable, multifunctional ReRAM arrays with a Block Activation Scheme and system-level scheduling to boost both spatial and temporal utilization, while reducing periphery overhead. Key contributions include a 512×512 unit ReRAM array per IMA, reconfigurable block activation, Conv/Res/FC, Max/ReLU, and Softmax functional blocks, plus inter-FB and intra-FB scheduling with HMS dataflow. Evaluations show up to $3.35\times$ speedup, $5.72\times$ energy efficiency, and $7.91\times$ area efficiency over baselines, with substantial gains in spatial and temporal utilization, enabling practical, energy-efficient CNN inference on ReRAM-based hardware.

Abstract

Resistive random-access memory (ReRAM) crossbar arrays are suitable for efficient inference computations in neural networks due to their analog general matrix-matrix multiplication (GEMM) capabilities. However, traditional ReRAM-based accelerators suffer from spatial and temporal underutilization. We present HURRY, a reconfigurable and multifunctional ReRAM-based in-situ accelerator. HURRY uses a block activation scheme for concurrent activation of dynamically sized ReRAM portions, enhancing spatial utilization. Additionally, it incorporates functional blocks for convolution, ReLU, max pooling, and softmax computations to improve temporal utilization. System-level scheduling and data mapping strategies further optimize performance. Consequently, HURRY achieves up to 3.35x speedup, 5.72x higher energy efficiency, and 7.91x greater area efficiency compared to current ReRAM-based accelerators.

HURRY: Highly Utilized, Reconfigurable ReRAM-based In-situ Accelerator with Multifunctionality

TL;DR

Resistive RAM-based in-situ accelerators for CNNs suffer from spatial and temporal underutilization, driven by suboptimal array sizing and excessive data movement. HURRY introduces reconfigurable, multifunctional ReRAM arrays with a Block Activation Scheme and system-level scheduling to boost both spatial and temporal utilization, while reducing periphery overhead. Key contributions include a 512×512 unit ReRAM array per IMA, reconfigurable block activation, Conv/Res/FC, Max/ReLU, and Softmax functional blocks, plus inter-FB and intra-FB scheduling with HMS dataflow. Evaluations show up to speedup, energy efficiency, and area efficiency over baselines, with substantial gains in spatial and temporal utilization, enabling practical, energy-efficient CNN inference on ReRAM-based hardware.

Abstract

Resistive random-access memory (ReRAM) crossbar arrays are suitable for efficient inference computations in neural networks due to their analog general matrix-matrix multiplication (GEMM) capabilities. However, traditional ReRAM-based accelerators suffer from spatial and temporal underutilization. We present HURRY, a reconfigurable and multifunctional ReRAM-based in-situ accelerator. HURRY uses a block activation scheme for concurrent activation of dynamically sized ReRAM portions, enhancing spatial utilization. Additionally, it incorporates functional blocks for convolution, ReLU, max pooling, and softmax computations to improve temporal utilization. System-level scheduling and data mapping strategies further optimize performance. Consequently, HURRY achieves up to 3.35x speedup, 5.72x higher energy efficiency, and 7.91x greater area efficiency compared to current ReRAM-based accelerators.
Paper Structure (25 sections, 1 equation, 8 figures, 2 algorithms)

This paper contains 25 sections, 1 equation, 8 figures, 2 algorithms.

Figures (8)

  • Figure 1: (a) Unit array size vs. ReRAM spatial utilization rate (b) Unit array size vs. Chip size/ADC power consumption
  • Figure 2: Overall architecture of HURRY
  • Figure 3: Block activation scheme simultaneously writing FB1 and reading FB2
  • Figure 4: Functional block implementation (a) Combined convolutional and residual functional block and (b) max pooling functional block
  • Figure 5: (a) Fine-grained pipelining of FBs (b) Relative positioning and sizing of FBs within a ReRAM array (c) Max pooling and ReLU FBs merged by data mapping strategy
  • ...and 3 more figures