Table of Contents
Fetching ...

MAC-DO: An Efficient Output-Stationary GEMM Accelerator for CNNs Using DRAM Technology

Minki Jeong, Wanyeong Jung

TL;DR

MAC-DO addresses the memory bottleneck in CNN inference by embedding an analog multi-bit MAC within DRAM, using two 1T1C cells per MAC-DO cell and charge-steering to achieve single-cycle MAC operations. The architecture employs an output-stationary data flow across a 2D MAC array, maximizing data reuse and minimizing data movement for matrix multiplications. It tackles non-idealities with digital and analog corrections for mismatch, non-linearity, and leakage, and demonstrates promising circuit-level results and system-level speedups, projecting multi-TOPS performance with high energy efficiency on DRAM-scale arrays. The work shows substantial gains over prior DRAM-based accelerators in computational density and energy efficiency, highlighting the potential of near-DRAM analog compute for edge AI deployments.

Abstract

DRAM-based in-situ accelerators have shown their potential in addressing the memory wall challenge of the traditional von Neumann architecture. Such accelerators exploit charge sharing or logic circuits for simple logic operations at the DRAM subarray level. However, their throughput is limited due to low array utilization, as only a few row cells in a DRAM array participate in operations while most rows remain deactivated. Moreover, they require many cycles for more complex operations such as a multi-bit multiply-accumulate (MAC) operation, resulting in significant data access and movement and potentially worsening power efficiency. To overcome these limitations, this paper presents MAC-DO, an efficient and low-power DRAM-based in-situ accelerator. Compared to previous DRAM-based in-situ accelerators, a MAC-DO cell, consisting of two 1T1C DRAM cells (two transistors and two capacitors), innately supports a multi-bit MAC operation within a single cycle, ensuring good linearity and compatibility with existing 1T1C DRAM cells and array structures. This achievement is facilitated by a novel analog computation method utilizing charge steering. Additionally, MAC-DO enables concurrent individual MAC operations in each MAC-DO cell without idle cells, significantly improving throughput and energy efficiency. As a result, a MAC-DO array efficiently can accelerate matrix multiplications based on output stationary mapping, supporting the majority of computations performed in deep neural networks (DNNs). Furthermore, a MAC-DO array efficiently reuses three types of data (input, weight and output), minimizing data movement.

MAC-DO: An Efficient Output-Stationary GEMM Accelerator for CNNs Using DRAM Technology

TL;DR

MAC-DO addresses the memory bottleneck in CNN inference by embedding an analog multi-bit MAC within DRAM, using two 1T1C cells per MAC-DO cell and charge-steering to achieve single-cycle MAC operations. The architecture employs an output-stationary data flow across a 2D MAC array, maximizing data reuse and minimizing data movement for matrix multiplications. It tackles non-idealities with digital and analog corrections for mismatch, non-linearity, and leakage, and demonstrates promising circuit-level results and system-level speedups, projecting multi-TOPS performance with high energy efficiency on DRAM-scale arrays. The work shows substantial gains over prior DRAM-based accelerators in computational density and energy efficiency, highlighting the potential of near-DRAM analog compute for edge AI deployments.

Abstract

DRAM-based in-situ accelerators have shown their potential in addressing the memory wall challenge of the traditional von Neumann architecture. Such accelerators exploit charge sharing or logic circuits for simple logic operations at the DRAM subarray level. However, their throughput is limited due to low array utilization, as only a few row cells in a DRAM array participate in operations while most rows remain deactivated. Moreover, they require many cycles for more complex operations such as a multi-bit multiply-accumulate (MAC) operation, resulting in significant data access and movement and potentially worsening power efficiency. To overcome these limitations, this paper presents MAC-DO, an efficient and low-power DRAM-based in-situ accelerator. Compared to previous DRAM-based in-situ accelerators, a MAC-DO cell, consisting of two 1T1C DRAM cells (two transistors and two capacitors), innately supports a multi-bit MAC operation within a single cycle, ensuring good linearity and compatibility with existing 1T1C DRAM cells and array structures. This achievement is facilitated by a novel analog computation method utilizing charge steering. Additionally, MAC-DO enables concurrent individual MAC operations in each MAC-DO cell without idle cells, significantly improving throughput and energy efficiency. As a result, a MAC-DO array efficiently can accelerate matrix multiplications based on output stationary mapping, supporting the majority of computations performed in deep neural networks (DNNs). Furthermore, a MAC-DO array efficiently reuses three types of data (input, weight and output), minimizing data movement.
Paper Structure (40 sections, 14 equations, 21 figures, 8 tables)

This paper contains 40 sections, 14 equations, 21 figures, 8 tables.

Figures (21)

  • Figure 1: Different types of DRAM-based processing techniques
  • Figure 2: Proposed MAC-DO architecture
  • Figure 3: Convolutions
  • Figure 4: Matrix multiplications through iterative outer products
  • Figure 5: The operation of a charge-steering amplifier
  • ...and 16 more figures