Table of Contents
Fetching ...

Reducing Inference Energy Consumption Using Dual Complementary CNNs

Michail Kinnas, John Violos, Ioannis Kompatsiaris, Symeon Papadopoulos

TL;DR

The paper addresses the energy cost of on-device CNN inference by proposing a dual complementary CNN framework augmented with a memory component to bypass repeated inferences. Predictions are dynamically allocated between two small, complementary networks using a confidence-based score, with a memory module indexing prior results via perceptual fingerprints to avoid re-computation. Complementarity is formalized as $\text{complementarity}(a,b) = (n(a \cup b) - n(a \cap b) - |n(a) - n(b)|)/N$, and the threshold $\lambda$ is optimized as $\lambda^* = \arg\max_{0<\lambda<1} acc(\lambda)$ to balance accuracy and energy. Empirically, the approach yields up to $85.8\%$ energy reduction on CIFAR-10 and substantial gains on ImageNet, Intel, and FashionMNIST with minimal accuracy loss, demonstrating a hardware-agnostic, on-device solution for energy-efficient AI pipelines. The work highlights the practical impact of cooperative small CNNs and memory-aware inference for resource-constrained edge environments, along with directions for extending complementarity to other data modalities and confidence-based formulations.

Abstract

Energy efficiency of Convolutional Neural Networks (CNNs) has become an important area of research, with various strategies being developed to minimize the power consumption of these models. Previous efforts, including techniques like model pruning, quantization, and hardware optimization, have made significant strides in this direction. However, there remains a need for more effective on device AI solutions that balance energy efficiency with model performance. In this paper, we propose a novel approach to reduce the energy requirements of inference of CNNs. Our methodology employs two small Complementary CNNs that collaborate with each other by covering each other's "weaknesses" in predictions. If the confidence for a prediction of the first CNN is considered low, the second CNN is invoked with the aim of producing a higher confidence prediction. This dual-CNN setup significantly reduces energy consumption compared to using a single large deep CNN. Additionally, we propose a memory component that retains previous classifications for identical inputs, bypassing the need to re-invoke the CNNs for the same input, further saving energy. Our experiments on a Jetson Nano computer demonstrate an energy reduction of up to 85.8% achieved on modified datasets where each sample was duplicated once. These findings indicate that leveraging a complementary CNN pair along with a memory component effectively reduces inference energy while maintaining high accuracy.

Reducing Inference Energy Consumption Using Dual Complementary CNNs

TL;DR

The paper addresses the energy cost of on-device CNN inference by proposing a dual complementary CNN framework augmented with a memory component to bypass repeated inferences. Predictions are dynamically allocated between two small, complementary networks using a confidence-based score, with a memory module indexing prior results via perceptual fingerprints to avoid re-computation. Complementarity is formalized as , and the threshold is optimized as to balance accuracy and energy. Empirically, the approach yields up to energy reduction on CIFAR-10 and substantial gains on ImageNet, Intel, and FashionMNIST with minimal accuracy loss, demonstrating a hardware-agnostic, on-device solution for energy-efficient AI pipelines. The work highlights the practical impact of cooperative small CNNs and memory-aware inference for resource-constrained edge environments, along with directions for extending complementarity to other data modalities and confidence-based formulations.

Abstract

Energy efficiency of Convolutional Neural Networks (CNNs) has become an important area of research, with various strategies being developed to minimize the power consumption of these models. Previous efforts, including techniques like model pruning, quantization, and hardware optimization, have made significant strides in this direction. However, there remains a need for more effective on device AI solutions that balance energy efficiency with model performance. In this paper, we propose a novel approach to reduce the energy requirements of inference of CNNs. Our methodology employs two small Complementary CNNs that collaborate with each other by covering each other's "weaknesses" in predictions. If the confidence for a prediction of the first CNN is considered low, the second CNN is invoked with the aim of producing a higher confidence prediction. This dual-CNN setup significantly reduces energy consumption compared to using a single large deep CNN. Additionally, we propose a memory component that retains previous classifications for identical inputs, bypassing the need to re-invoke the CNNs for the same input, further saving energy. Our experiments on a Jetson Nano computer demonstrate an energy reduction of up to 85.8% achieved on modified datasets where each sample was duplicated once. These findings indicate that leveraging a complementary CNN pair along with a memory component effectively reduces inference energy while maintaining high accuracy.

Paper Structure

This paper contains 28 sections, 16 equations, 16 figures, 3 tables.

Figures (16)

  • Figure 1: Overview of our proposed methodology, which consists of two complementary small CNN's with a memory component.
  • Figure 2: Complementarity based on the predictions of two CNN models.
  • Figure 3: Fine-tuning to increase complementarity.
  • Figure 4: Complementarity matrix of CIFAR-10 available pretrained PyTorch models.
  • Figure 5: Post-check Evaluation, using the Difference score function on the ImageNet validation dataset using our configuration I3
  • ...and 11 more figures