Table of Contents
Fetching ...

A Hybrid Edge Classifier: Combining TinyML-Optimised CNN with RRAM-CMOS ACAM for Energy-Efficient Inference

Kieran Woodward, Eiman Kanjo, Georgios Papandroulidakis, Shady Agwa, Themis Prodromakis

TL;DR

This paper tackles extreme-edge inference by introducing a hybrid classifier that couples a tinyML front-end with a RRAM-CMOS ACAM back-end for template matching. Through knowledge distillation, pruning, and quantisation, the authors produce a compact student that preserves substantial performance while enabling an energy-efficient analogue classification stage. The ACAM back-end achieves low-energy, parallel pattern matching, with binary and distance-based template strategies supporting ultra-low-power inference. Together, the approach delivers up to ~800× reduction in MAC operations and ~792× energy savings compared with the teacher model, while maintaining practical accuracy for edge wearables and near-sensor sensing tasks.

Abstract

In recent years, the development of smart edge computing systems to process information locally is on the rise. Many near-sensor machine learning (ML) approaches have been implemented to introduce accurate and energy efficient template matching operations in resource-constrained edge sensing systems, such as wearables. To introduce novel solutions that can be viable for extreme edge cases, hybrid solutions combining conventional and emerging technologies have started to be proposed. Deep Neural Networks (DNN) optimised for edge application alongside new approaches of computing (both device and architecture -wise) could be a strong candidate in implementing edge ML solutions that aim at competitive accuracy classification while using a fraction of the power of conventional ML solutions. In this work, we are proposing a hybrid software-hardware edge classifier aimed at the extreme edge near-sensor systems. The classifier consists of two parts: (i) an optimised digital tinyML network, working as a front-end feature extractor, and (ii) a back-end RRAM-CMOS analogue content addressable memory (ACAM), working as a final stage template matching system. The combined hybrid system exhibits a competitive trade-off in accuracy versus energy metric with $E_{front-end}$ = $96.23 nJ$ and $E_{back-end}$ = $1.45 nJ$ for each classification operation compared with 78.06$μ$J for the original teacher model, representing a 792-fold reduction, making it a viable solution for extreme edge applications.

A Hybrid Edge Classifier: Combining TinyML-Optimised CNN with RRAM-CMOS ACAM for Energy-Efficient Inference

TL;DR

This paper tackles extreme-edge inference by introducing a hybrid classifier that couples a tinyML front-end with a RRAM-CMOS ACAM back-end for template matching. Through knowledge distillation, pruning, and quantisation, the authors produce a compact student that preserves substantial performance while enabling an energy-efficient analogue classification stage. The ACAM back-end achieves low-energy, parallel pattern matching, with binary and distance-based template strategies supporting ultra-low-power inference. Together, the approach delivers up to ~800× reduction in MAC operations and ~792× energy savings compared with the teacher model, while maintaining practical accuracy for edge wearables and near-sensor sensing tasks.

Abstract

In recent years, the development of smart edge computing systems to process information locally is on the rise. Many near-sensor machine learning (ML) approaches have been implemented to introduce accurate and energy efficient template matching operations in resource-constrained edge sensing systems, such as wearables. To introduce novel solutions that can be viable for extreme edge cases, hybrid solutions combining conventional and emerging technologies have started to be proposed. Deep Neural Networks (DNN) optimised for edge application alongside new approaches of computing (both device and architecture -wise) could be a strong candidate in implementing edge ML solutions that aim at competitive accuracy classification while using a fraction of the power of conventional ML solutions. In this work, we are proposing a hybrid software-hardware edge classifier aimed at the extreme edge near-sensor systems. The classifier consists of two parts: (i) an optimised digital tinyML network, working as a front-end feature extractor, and (ii) a back-end RRAM-CMOS analogue content addressable memory (ACAM), working as a final stage template matching system. The combined hybrid system exhibits a competitive trade-off in accuracy versus energy metric with = and = for each classification operation compared with 78.06J for the original teacher model, representing a 792-fold reduction, making it a viable solution for extreme edge applications.

Paper Structure

This paper contains 22 sections, 14 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Comparison of mean (red) and median (green) thresholding for each feature output from the front-end classifier.
  • Figure 2: Combined Deep Neural Network (DNN), for feature extraction and dimensionality reduction, with Analogue Content Addressable Memory (ACAM) back-end network papandroulidakis2024_9t4r_arXiv, for final classification of the extracted feature map through analogue information processing.
  • Figure 3: Analogue Content Addressable Memory (ACAM) block diagram showcasing the main analogue computing blocks papandroulidakis2024_9t4r_arXiv. The block is effectively a two layer network, with the first layer being the ACAM module that calculates in parallel the similarity of the input feature maps with the pre-computed templates. The second layer senses the similarities and converts them into the proper voltage levels to pass through a final Winner Take All (WTA) network that computes the argmax function on the set of similarities. We assume that the feature map used as input to ACAM is the output of the front-end feature extractor with the ACAM being employed as a final layer classification network.
  • Figure 4: RRAM-CMOS -based Template piXeL (TXL) ACAM cell schematics. There are many version of the TXL-ACAM technologies with each version comprised of a specific set of trade-offs. In (a) a 6T4R charging design is showcased aimed at ML applications with increased sparsity papandroulidakis2024_9t4r_arXiv In (b), a 3T1R precharging design is shown that is aimed at applications that has strict area specifications as well as differentiability as trait for the final stage classification Agwa2023_3T1R_ACAM.
  • Figure 5: Student CNN model architecture using a traditional softmax classifier.
  • ...and 2 more figures