HURRY: Highly Utilized, Reconfigurable ReRAM-based In-situ Accelerator with Multifunctionality

Hery Shin; Jae-Young Kim; Donghyuk Kim; Joo-Young Kim

HURRY: Highly Utilized, Reconfigurable ReRAM-based In-situ Accelerator with Multifunctionality

Hery Shin, Jae-Young Kim, Donghyuk Kim, Joo-Young Kim

TL;DR

Resistive RAM-based in-situ accelerators for CNNs suffer from spatial and temporal underutilization, driven by suboptimal array sizing and excessive data movement. HURRY introduces reconfigurable, multifunctional ReRAM arrays with a Block Activation Scheme and system-level scheduling to boost both spatial and temporal utilization, while reducing periphery overhead. Key contributions include a 512×512 unit ReRAM array per IMA, reconfigurable block activation, Conv/Res/FC, Max/ReLU, and Softmax functional blocks, plus inter-FB and intra-FB scheduling with HMS dataflow. Evaluations show up to $3.35\times$ speedup, $5.72\times$ energy efficiency, and $7.91\times$ area efficiency over baselines, with substantial gains in spatial and temporal utilization, enabling practical, energy-efficient CNN inference on ReRAM-based hardware.

Abstract

Resistive random-access memory (ReRAM) crossbar arrays are suitable for efficient inference computations in neural networks due to their analog general matrix-matrix multiplication (GEMM) capabilities. However, traditional ReRAM-based accelerators suffer from spatial and temporal underutilization. We present HURRY, a reconfigurable and multifunctional ReRAM-based in-situ accelerator. HURRY uses a block activation scheme for concurrent activation of dynamically sized ReRAM portions, enhancing spatial utilization. Additionally, it incorporates functional blocks for convolution, ReLU, max pooling, and softmax computations to improve temporal utilization. System-level scheduling and data mapping strategies further optimize performance. Consequently, HURRY achieves up to 3.35x speedup, 5.72x higher energy efficiency, and 7.91x greater area efficiency compared to current ReRAM-based accelerators.

HURRY: Highly Utilized, Reconfigurable ReRAM-based In-situ Accelerator with Multifunctionality

TL;DR

speedup,

energy efficiency, and

area efficiency over baselines, with substantial gains in spatial and temporal utilization, enabling practical, energy-efficient CNN inference on ReRAM-based hardware.

Abstract

Paper Structure (25 sections, 1 equation, 8 figures, 2 algorithms)

This paper contains 25 sections, 1 equation, 8 figures, 2 algorithms.

Introduction
HURRY Architecture
Overview
Block Activation Scheme
Functional Block Implementation
Conv, Res, and FC
Max and ReLU
Softmax Support
Model-aware Scheduling
Inter-FB Scheduling
Inter-FB Mapping
FB Relative Positioning
FB Size Balancing
Intra-FB Data Mapping
Evaluation & Discussion
...and 10 more sections

Figures (8)

Figure 1: (a) Unit array size vs. ReRAM spatial utilization rate (b) Unit array size vs. Chip size/ADC power consumption
Figure 2: Overall architecture of HURRY
Figure 3: Block activation scheme simultaneously writing FB1 and reading FB2
Figure 4: Functional block implementation (a) Combined convolutional and residual functional block and (b) max pooling functional block
Figure 5: (a) Fine-grained pipelining of FBs (b) Relative positioning and sizing of FBs within a ReRAM array (c) Max pooling and ReLU FBs merged by data mapping strategy
...and 3 more figures

HURRY: Highly Utilized, Reconfigurable ReRAM-based In-situ Accelerator with Multifunctionality

TL;DR

Abstract

HURRY: Highly Utilized, Reconfigurable ReRAM-based In-situ Accelerator with Multifunctionality

Authors

TL;DR

Abstract

Table of Contents

Figures (8)