Reducing Inference Energy Consumption Using Dual Complementary CNNs
Michail Kinnas, John Violos, Ioannis Kompatsiaris, Symeon Papadopoulos
TL;DR
The paper addresses the energy cost of on-device CNN inference by proposing a dual complementary CNN framework augmented with a memory component to bypass repeated inferences. Predictions are dynamically allocated between two small, complementary networks using a confidence-based score, with a memory module indexing prior results via perceptual fingerprints to avoid re-computation. Complementarity is formalized as $\text{complementarity}(a,b) = (n(a \cup b) - n(a \cap b) - |n(a) - n(b)|)/N$, and the threshold $\lambda$ is optimized as $\lambda^* = \arg\max_{0<\lambda<1} acc(\lambda)$ to balance accuracy and energy. Empirically, the approach yields up to $85.8\%$ energy reduction on CIFAR-10 and substantial gains on ImageNet, Intel, and FashionMNIST with minimal accuracy loss, demonstrating a hardware-agnostic, on-device solution for energy-efficient AI pipelines. The work highlights the practical impact of cooperative small CNNs and memory-aware inference for resource-constrained edge environments, along with directions for extending complementarity to other data modalities and confidence-based formulations.
Abstract
Energy efficiency of Convolutional Neural Networks (CNNs) has become an important area of research, with various strategies being developed to minimize the power consumption of these models. Previous efforts, including techniques like model pruning, quantization, and hardware optimization, have made significant strides in this direction. However, there remains a need for more effective on device AI solutions that balance energy efficiency with model performance. In this paper, we propose a novel approach to reduce the energy requirements of inference of CNNs. Our methodology employs two small Complementary CNNs that collaborate with each other by covering each other's "weaknesses" in predictions. If the confidence for a prediction of the first CNN is considered low, the second CNN is invoked with the aim of producing a higher confidence prediction. This dual-CNN setup significantly reduces energy consumption compared to using a single large deep CNN. Additionally, we propose a memory component that retains previous classifications for identical inputs, bypassing the need to re-invoke the CNNs for the same input, further saving energy. Our experiments on a Jetson Nano computer demonstrate an energy reduction of up to 85.8% achieved on modified datasets where each sample was duplicated once. These findings indicate that leveraging a complementary CNN pair along with a memory component effectively reduces inference energy while maintaining high accuracy.
