Dynamic Activation with Knowledge Distillation for Energy-Efficient Spiking NN Ensembles
Orestis Konstantaropoulos, Theodoris Mallios, Maria Papadopouli
TL;DR
The paper tackles the energy demands of foundation AI models by introducing Spiking Neural Ensemble (SNE), a framework that distills knowledge from a large ANN teacher into multiple SNN students arranged as an ensemble. Each student learns a distinct subset of the teacher's feature space, and the ensemble can dynamically activate only a subset of students to balance accuracy and energy efficiency. Through a combination of knowledge distillation, feature-space disentanglement (frozen or fine-tuned teacher), and flexible dropout strategies, SNE achieves substantial energy savings (up to about 20x fewer FLOPs) with minimal accuracy loss on CIFAR-10, and shows improved robustness under noise. This approach offers a practical path toward deploying energy-efficient, neuromorphic-inspired inference for edge and embedded AI applications.
Abstract
While foundation AI models excel at tasks like classification and decision-making, their high energy consumption makes them unsuitable for energy-constrained applications. Inspired by the brain's efficiency, spiking neural networks (SNNs) have emerged as a viable alternative due to their event-driven nature and compatibility with neuromorphic chips. This work introduces a novel system that combines knowledge distillation and ensemble learning to bridge the performance gap between artificial neural networks (ANNs) and SNNs. A foundation AI model acts as a teacher network, guiding smaller student SNNs organized into an ensemble, called Spiking Neural Ensemble (SNE). SNE enables the disentanglement of the teacher's knowledge, allowing each student to specialize in predicting a distinct aspect of it, while processing the same input. The core innovation of SNE is the adaptive activation of a subset of SNN models of an ensemble, leveraging knowledge-distillation, enhanced with an informed-partitioning (disentanglement) of the teacher's feature space. By dynamically activating only a subset of these student SNNs, the system balances accuracy and energy efficiency, achieving substantial energy savings with minimal accuracy loss. Moreover, SNE is significantly more efficient than the teacher network, reducing computational requirements by up to 20x with only a 2% drop in accuracy on the CIFAR-10 dataset. This disentanglement procedure achieves an accuracy improvement of up to 2.4% on the CIFAR-10 dataset compared to other partitioning schemes. Finally, we comparatively analyze SNE performance under noisy conditions, demonstrating enhanced robustness compared to its ANN teacher. In summary, SNE offers a promising new direction for energy-constrained applications.
