Table of Contents
Fetching ...

A 9T4R RRAM-Based ACAM for Analogue Template Matching at the Edge

Georgios Papandroulidakis, Shady Agwa, Ahmet Cirakoglu, Themis Prodromakis

TL;DR

The paper tackles the energy bottlenecks of data movement in edge AI by proposing an analogue template-matching accelerator built from a 9T4R RRAM-CMOS pixel (TXL) for ACAM. It introduces a dual-threshold per-cell design using hybrid RRAM-CMOS inverters to implement configurable matching windows, enabling near-sensor, analogue template matching with reduced data conversion. A 32×48 TXL-ACAM prototype in 180 nm CMOS with back-end-of-line RRAM demonstrates competitive energy efficiency, achieving approximately 0.16 pJ per match and 0.036 pJ per mismatch per cell at 66 MHz and 3 V, along with programmable read/write functionality and system-level peripherals. The work also analyzes process variability and presents a complete IC design with front-end sampling, analogue drivers, accumulators, sense amplifiers, and serial readout, underscoring the potential of memory-centric accelerators for energy-efficient edge classification with future scaling and benchmarking opportunities.

Abstract

The continuous shift of computational bottlenecks to the memory access and data transfer, especially for AI applications, poses the urgent needs of re-engineering the computer architecture fundamentals. Many edge computing applications, like wearable and implantable medical devices, introduce increasingly more challenges to conventional computing systems due to the strict requirements of area and power at the edge. Emerging technologies, like Resistive RAM (RRAM), have shown a promising momentum in developing neuro-inspired analogue computing paradigms capable of achieving high classification capabilities alongside high energy efficiency. In this work, we present a novel RRAM-based Analogue Content Addressable Memory (ACAM) for on-line analogue template matching applications. This ACAM-based template matching architecture aims to achieve energy-efficient classification where low energy is of utmost importance. We are showcasing a highly tuneable novel RRAM-based ACAM pixel implemented using a commercial 180nm CMOS technology and in-house RRAM technology and exhibiting low energy dissipation of approximately 0.036pJ and 0.16pJ for mismatch and match, respectively, at 66MHz with 3V voltage supply. A proof-of-concept system-level implementation based on this novel pixel design is also implemented in 180nm.

A 9T4R RRAM-Based ACAM for Analogue Template Matching at the Edge

TL;DR

The paper tackles the energy bottlenecks of data movement in edge AI by proposing an analogue template-matching accelerator built from a 9T4R RRAM-CMOS pixel (TXL) for ACAM. It introduces a dual-threshold per-cell design using hybrid RRAM-CMOS inverters to implement configurable matching windows, enabling near-sensor, analogue template matching with reduced data conversion. A 32×48 TXL-ACAM prototype in 180 nm CMOS with back-end-of-line RRAM demonstrates competitive energy efficiency, achieving approximately 0.16 pJ per match and 0.036 pJ per mismatch per cell at 66 MHz and 3 V, along with programmable read/write functionality and system-level peripherals. The work also analyzes process variability and presents a complete IC design with front-end sampling, analogue drivers, accumulators, sense amplifiers, and serial readout, underscoring the potential of memory-centric accelerators for energy-efficient edge classification with future scaling and benchmarking opportunities.

Abstract

The continuous shift of computational bottlenecks to the memory access and data transfer, especially for AI applications, poses the urgent needs of re-engineering the computer architecture fundamentals. Many edge computing applications, like wearable and implantable medical devices, introduce increasingly more challenges to conventional computing systems due to the strict requirements of area and power at the edge. Emerging technologies, like Resistive RAM (RRAM), have shown a promising momentum in developing neuro-inspired analogue computing paradigms capable of achieving high classification capabilities alongside high energy efficiency. In this work, we present a novel RRAM-based Analogue Content Addressable Memory (ACAM) for on-line analogue template matching applications. This ACAM-based template matching architecture aims to achieve energy-efficient classification where low energy is of utmost importance. We are showcasing a highly tuneable novel RRAM-based ACAM pixel implemented using a commercial 180nm CMOS technology and in-house RRAM technology and exhibiting low energy dissipation of approximately 0.036pJ and 0.16pJ for mismatch and match, respectively, at 66MHz with 3V voltage supply. A proof-of-concept system-level implementation based on this novel pixel design is also implemented in 180nm.
Paper Structure (6 sections, 10 figures, 2 tables)

This paper contains 6 sections, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Concept-level diagram of the TXL ACAM application in near-sensor implementations for signal classification. The signal online template matching application envisioned for the proposed RRAM-CMOS ACAM assumes the signal capturing and pre-processing through appropriate analogue front-end circuitry and then encoding of the signal into an appropriately formatted input query to search through the RRAM-based ACAM classification engine back-end. An analogue input vector sampled from the singular continuous signal is supplied to the classifier as a query key.
  • Figure 2: (a) General diagrams of the operation layers used for ACAM (b) ACAMs are used to calculate the distance between the query input and the patterns stored in the ACAM. (c) General TXL array organisation with the query being distributed to the columns of the array to enable a parallel search operation while all rows of the array are organised as matchlines that calculate and send to the sense amplifiers (SAs) the distance of the query and the template stored in this matchline. (d) Circuit schematic of the proposed 9-transistors-4-resistors (9T4R) pixel design for analogue template matching applications. The cell can map two threshold (a low and a high threshold) through the use of its non-volatile RRAM devices ($R_{M1}$ and $R_{M2}$). The use of the hybrid RRAM-CMOS inverter design enables the movement of the threshold voltage of the inverter depending on the resistive ratio of $R_{M1}$-$R_{1}$ and $R_{M2}$-$R_{2}$ (through source degeneration of the MOSFET devices). The memory part of the cell is comprised from the two RRAM-CMOS inverters while the per cell readout circuit (3T Cell Comparison part) is comprised from $M_{MEN}$,$M_{MLP}$ and $M_{MLN}$ devices. The $M_{PR1}$ and $M_{PR2}$ devices of the 2T write bypass circuit are used for accessing the RRAM for programming.
  • Figure 3: TXL modes of operation. In (a), (b) and (c) three different match/mismatch cases are shown while in (d) the programming mode has been enabled with the 2T2R and 3T branches no used and the 1T1R programming paths enabled. The 9T4R cell operates effectively in two modes: template matching mode and programming mode. During the template matching mode the $M_{PR1}$ and $M_{PR2}$ nMOS are non-conductive and the cell is using its RRAM conductance states to compare an input with its internal matching window. While during programming mode, the $M_{PR1}$ and $M_{PR2}$ nMOS are conductive (one at a time) to enable a low resistance path to close the appropriate circuit with its peripherals and write the RRAM devices to specific conductance states.
  • Figure 4: In this figure, a representation of the 9T4R cell layout is shown alongside its dimensions. the main parts of the cell consist of the 7T CMOS part of the cell and the 2T programming nMOS devices alongside the BEOL integrated RRAM devices. The lower left area of the cell is allocated for polysilicon integration for the emulator array where instead of BEOL RRAM integration, conventional polysilicon fixed resistors are used for testing and calibration. The lower right area is used for the biasing resistors $R_{1}$ and $R_{2}$.
  • Figure 5: Simulation results showcasing a typical matching window response for an input ramp. The analogue voltage input $V_{IN}$ is sweeped in transient simulation for different values of $R_{M2}$. The parametric analysis of the RRAM values are displayed on top of each other to showcase the moving of the upper threshold $TH_{HI}$. The simulations were performed in Spectre Cadence Virtuoso environment using 180 nm CMOS technology and in-house RRAM models Messaris2017. It can be observed that for increasing $R_{M2}$ the high threshold is decreasing (parametric analysis of $TH_{HI}$) resulting in an increasingly smaller matching window (showcased by the output of the 9T4R through the $MLO$ set of traces), assuming that the lower threshold $TH_{LO}$ is not shifting due to changes in $R_{M1}$ value. The output of the cell for each RRAM configuration is shown in the $MLO$ set of traces. The charging and reset dynamics are controlled by the enable, reset matchline accumulator and operating voltage $V_{DD}$ =3.3 V.
  • ...and 5 more figures