Table of Contents
Fetching ...

Analog or Digital In-memory Computing? Benchmarking through Quantitative Modeling

Jiacong Sun, Pouya Houshmand, Marian Verhelst

TL;DR

The paper addresses the challenge of fairly comparing Analog In-Memory Computing (AIMC) and Digital In-Memory Computing (DIMC) architectures across diverse SRAM-based implementations and process nodes. It proposes a unified analytical IMC performance model that covers memory arrays, analog peripherals (ADCs/DACs), and digital peripherals, and validates it against published AIMC/DIMC designs, integrating the model into the ZigZag system-level design space exploration framework. Through macro- and system-level benchmarking on target workloads (e.g., MLPerf Tiny), the study finds that DIMC generally achieves higher computational density, but AIMC with large macros can be more energy-efficient on convolutional and pointwise layers, while small-DIMC macros outperform AIMC on depthwise layers. The results provide a quantitative basis for architectural choices in edge ML, enabling designers to select AIMC vs DIMC configurations based on workload characteristics, and the open-source framework supports broader, workload-driven IMC exploration.

Abstract

In-Memory Computing (IMC) has emerged as a promising paradigm for energy-efficient, throughput-efficient and area-efficient machine learning at the edge. However, the differences in hardware architectures, array dimensions, and fabrication technologies among published IMC realizations have made it difficult to grasp their relative strengths. Moreover, previous studies have primarily focused on exploring and benchmarking the peak performance of a single IMC macro rather than full system performance on real workloads. This paper aims to address the lack of a quantitative comparison of Analog In-Memory Computing (AIMC) and Digital In-Memory Computing (DIMC) processor architectures. We propose an analytical IMC performance model that is validated against published implementations and integrated into a system-level exploration framework for comprehensive performance assessments on different workloads with varying IMC configurations. Our experiments show that while DIMC generally has higher computational density than AIMC, AIMC with large macro sizes may have better energy efficiency than DIMC on convolutional-layers and pointwise-layers, which can exploit high spatial unrolling. On the other hand, DIMC with small macro size outperforms AIMC on depthwise-layers, which feature limited spatial unrolling opportunities inside a macro.

Analog or Digital In-memory Computing? Benchmarking through Quantitative Modeling

TL;DR

The paper addresses the challenge of fairly comparing Analog In-Memory Computing (AIMC) and Digital In-Memory Computing (DIMC) architectures across diverse SRAM-based implementations and process nodes. It proposes a unified analytical IMC performance model that covers memory arrays, analog peripherals (ADCs/DACs), and digital peripherals, and validates it against published AIMC/DIMC designs, integrating the model into the ZigZag system-level design space exploration framework. Through macro- and system-level benchmarking on target workloads (e.g., MLPerf Tiny), the study finds that DIMC generally achieves higher computational density, but AIMC with large macros can be more energy-efficient on convolutional and pointwise layers, while small-DIMC macros outperform AIMC on depthwise layers. The results provide a quantitative basis for architectural choices in edge ML, enabling designers to select AIMC vs DIMC configurations based on workload characteristics, and the open-source framework supports broader, workload-driven IMC exploration.

Abstract

In-Memory Computing (IMC) has emerged as a promising paradigm for energy-efficient, throughput-efficient and area-efficient machine learning at the edge. However, the differences in hardware architectures, array dimensions, and fabrication technologies among published IMC realizations have made it difficult to grasp their relative strengths. Moreover, previous studies have primarily focused on exploring and benchmarking the peak performance of a single IMC macro rather than full system performance on real workloads. This paper aims to address the lack of a quantitative comparison of Analog In-Memory Computing (AIMC) and Digital In-Memory Computing (DIMC) processor architectures. We propose an analytical IMC performance model that is validated against published implementations and integrated into a system-level exploration framework for comprehensive performance assessments on different workloads with varying IMC configurations. Our experiments show that while DIMC generally has higher computational density than AIMC, AIMC with large macro sizes may have better energy efficiency than DIMC on convolutional-layers and pointwise-layers, which can exploit high spatial unrolling. On the other hand, DIMC with small macro size outperforms AIMC on depthwise-layers, which feature limited spatial unrolling opportunities inside a macro.
Paper Structure (26 sections, 10 equations, 12 figures, 4 tables)

This paper contains 26 sections, 10 equations, 12 figures, 4 tables.

Figures (12)

  • Figure 1: 8-nested loop DNN layer representation and workloads representation
  • Figure 2: Fundamental architecture for AIMC/DIMC and their mapping paradigm. "u" in the subscript represents spatial unrolling.
  • Figure 3: Peak performance benchmarking on AIMC/DIMC architectures. Each point reports the used technology node and weight bit precision.
  • Figure 4: Equivalent worst case scenario for ADC in AIMC macro. The transistor or transmission gate in each IMC cell connected to the bitline can be treated as a resistor and a capacitor. The worst case happens when only one cell is charging or discharging all capacitors.
  • Figure 5: $B_{\text{adds\_in}}$-bit adder tree topology for A/DIMC (right). $B_{\text{adds\_in}}=\text{ADC}_{\text{res}}$ for AIMC and $B_{\text{adds\_in}}=B_w$ for bit-serial DIMC. The structure of each node (single-stage adder) assuming RCA type is shown in the top left. The gate-level circuit for 1b FA is shown in the bottom left.
  • ...and 7 more figures