Table of Contents
Fetching ...

Optimizing Binary and Ternary Neural Network Inference on RRAM Crossbars using CIM-Explorer

Rebecca Pelke, José Cubero-Cascante, Nils Bosbach, Niklas Degener, Florian Idrizi, Lennart M. Reimann, Jan Moritz Joseph, Rainer Leupers

TL;DR

This work tackles the challenge of efficiently running binary and ternary neural networks on RRAM crossbars within a Computing-in-Memory framework, addressing non-idealities such as device variability and limited weight states. It introduces CIM-Explorer, a modular, TVM-based compiler toolkit with Larq frontend support, multiple crossbar mappings (digital and analog) and a design-space exploration flow to assess accuracy under varying crossbar sizes, ADC settings, and compute modes. The results demonstrate how different mappings, variability profiles, and ADC resolutions influence inference accuracy on 256×256 crossbars, showing that differential mappings often offer the best robustness and that modest ADC resolutions (as low as 3 bits) can suffice for larger BNNs, while TNNs require distinct trade-offs. Overall, CIM-Explorer provides an end-to-end framework to guide early design decisions, mapping choices, and compiler configurations for CIM-based BNN/ TNN deployment, with an emphasis on accuracy under hardware non-idealities and extensibility toward real hardware integration.

Abstract

Using Resistive Random Access Memory (RRAM) crossbars in Computing-in-Memory (CIM) architectures offers a promising solution to overcome the von Neumann bottleneck. Due to non-idealities like cell variability, RRAM crossbars are often operated in binary mode, utilizing only two states: Low Resistive State (LRS) and High Resistive State (HRS). Binary Neural Networks (BNNs) and Ternary Neural Networks (TNNs) are well-suited for this hardware due to their efficient mapping. Existing software projects for RRAM-based CIM typically focus on only one aspect: compilation, simulation, or Design Space Exploration (DSE). Moreover, they often rely on classical 8 bit quantization. To address these limitations, we introduce CIM-Explorer, a modular toolkit for optimizing BNN and TNN inference on RRAM crossbars. CIM-Explorer includes an end-to-end compiler stack, multiple mapping options, and simulators, enabling a DSE flow for accuracy estimation across different crossbar parameters and mappings. CIM-Explorer can accompany the entire design process, from early accuracy estimation for specific crossbar parameters, to selecting an appropriate mapping, and compiling BNNs and TNNs for a finalized crossbar chip. In DSE case studies, we demonstrate the expected accuracy for various mappings and crossbar parameters. CIM-Explorer can be found on GitHub.

Optimizing Binary and Ternary Neural Network Inference on RRAM Crossbars using CIM-Explorer

TL;DR

This work tackles the challenge of efficiently running binary and ternary neural networks on RRAM crossbars within a Computing-in-Memory framework, addressing non-idealities such as device variability and limited weight states. It introduces CIM-Explorer, a modular, TVM-based compiler toolkit with Larq frontend support, multiple crossbar mappings (digital and analog) and a design-space exploration flow to assess accuracy under varying crossbar sizes, ADC settings, and compute modes. The results demonstrate how different mappings, variability profiles, and ADC resolutions influence inference accuracy on 256×256 crossbars, showing that differential mappings often offer the best robustness and that modest ADC resolutions (as low as 3 bits) can suffice for larger BNNs, while TNNs require distinct trade-offs. Overall, CIM-Explorer provides an end-to-end framework to guide early design decisions, mapping choices, and compiler configurations for CIM-based BNN/ TNN deployment, with an emphasis on accuracy under hardware non-idealities and extensibility toward real hardware integration.

Abstract

Using Resistive Random Access Memory (RRAM) crossbars in Computing-in-Memory (CIM) architectures offers a promising solution to overcome the von Neumann bottleneck. Due to non-idealities like cell variability, RRAM crossbars are often operated in binary mode, utilizing only two states: Low Resistive State (LRS) and High Resistive State (HRS). Binary Neural Networks (BNNs) and Ternary Neural Networks (TNNs) are well-suited for this hardware due to their efficient mapping. Existing software projects for RRAM-based CIM typically focus on only one aspect: compilation, simulation, or Design Space Exploration (DSE). Moreover, they often rely on classical 8 bit quantization. To address these limitations, we introduce CIM-Explorer, a modular toolkit for optimizing BNN and TNN inference on RRAM crossbars. CIM-Explorer includes an end-to-end compiler stack, multiple mapping options, and simulators, enabling a DSE flow for accuracy estimation across different crossbar parameters and mappings. CIM-Explorer can accompany the entire design process, from early accuracy estimation for specific crossbar parameters, to selecting an appropriate mapping, and compiling BNNs and TNNs for a finalized crossbar chip. In DSE case studies, we demonstrate the expected accuracy for various mappings and crossbar parameters. CIM-Explorer can be found on GitHub.

Paper Structure

This paper contains 22 sections, 4 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Overview of the individual modules of CIM-Explorer. At compile time, the bnn or tnn is optimized for execution on a crossbar I. During runtime, the weights are prepared according to the compute mode II. Several backends can be used for execution, e.g. different simulators III. A dse tool automates finding optimal crossbar parameters and mappings IV.
  • Figure 2: The CIM architecture components considered in this work.
  • Figure 3: The interfaces of the toolkit. The functional interface separates compilation and mapping. The crossbar interface separates mapping and simulation.
  • Figure 4: The compiler pipeline including pre-trained inputs, a new frontend, partitioning, scheduling primitives and lowering passes, and code generation.
  • Figure 5: Scheduling primitives are applied to the initial loop nest of Conv2D.
  • ...and 4 more figures