Table of Contents
Fetching ...

Dynamic Activation with Knowledge Distillation for Energy-Efficient Spiking NN Ensembles

Orestis Konstantaropoulos, Theodoris Mallios, Maria Papadopouli

TL;DR

The paper tackles the energy demands of foundation AI models by introducing Spiking Neural Ensemble (SNE), a framework that distills knowledge from a large ANN teacher into multiple SNN students arranged as an ensemble. Each student learns a distinct subset of the teacher's feature space, and the ensemble can dynamically activate only a subset of students to balance accuracy and energy efficiency. Through a combination of knowledge distillation, feature-space disentanglement (frozen or fine-tuned teacher), and flexible dropout strategies, SNE achieves substantial energy savings (up to about 20x fewer FLOPs) with minimal accuracy loss on CIFAR-10, and shows improved robustness under noise. This approach offers a practical path toward deploying energy-efficient, neuromorphic-inspired inference for edge and embedded AI applications.

Abstract

While foundation AI models excel at tasks like classification and decision-making, their high energy consumption makes them unsuitable for energy-constrained applications. Inspired by the brain's efficiency, spiking neural networks (SNNs) have emerged as a viable alternative due to their event-driven nature and compatibility with neuromorphic chips. This work introduces a novel system that combines knowledge distillation and ensemble learning to bridge the performance gap between artificial neural networks (ANNs) and SNNs. A foundation AI model acts as a teacher network, guiding smaller student SNNs organized into an ensemble, called Spiking Neural Ensemble (SNE). SNE enables the disentanglement of the teacher's knowledge, allowing each student to specialize in predicting a distinct aspect of it, while processing the same input. The core innovation of SNE is the adaptive activation of a subset of SNN models of an ensemble, leveraging knowledge-distillation, enhanced with an informed-partitioning (disentanglement) of the teacher's feature space. By dynamically activating only a subset of these student SNNs, the system balances accuracy and energy efficiency, achieving substantial energy savings with minimal accuracy loss. Moreover, SNE is significantly more efficient than the teacher network, reducing computational requirements by up to 20x with only a 2% drop in accuracy on the CIFAR-10 dataset. This disentanglement procedure achieves an accuracy improvement of up to 2.4% on the CIFAR-10 dataset compared to other partitioning schemes. Finally, we comparatively analyze SNE performance under noisy conditions, demonstrating enhanced robustness compared to its ANN teacher. In summary, SNE offers a promising new direction for energy-constrained applications.

Dynamic Activation with Knowledge Distillation for Energy-Efficient Spiking NN Ensembles

TL;DR

The paper tackles the energy demands of foundation AI models by introducing Spiking Neural Ensemble (SNE), a framework that distills knowledge from a large ANN teacher into multiple SNN students arranged as an ensemble. Each student learns a distinct subset of the teacher's feature space, and the ensemble can dynamically activate only a subset of students to balance accuracy and energy efficiency. Through a combination of knowledge distillation, feature-space disentanglement (frozen or fine-tuned teacher), and flexible dropout strategies, SNE achieves substantial energy savings (up to about 20x fewer FLOPs) with minimal accuracy loss on CIFAR-10, and shows improved robustness under noise. This approach offers a practical path toward deploying energy-efficient, neuromorphic-inspired inference for edge and embedded AI applications.

Abstract

While foundation AI models excel at tasks like classification and decision-making, their high energy consumption makes them unsuitable for energy-constrained applications. Inspired by the brain's efficiency, spiking neural networks (SNNs) have emerged as a viable alternative due to their event-driven nature and compatibility with neuromorphic chips. This work introduces a novel system that combines knowledge distillation and ensemble learning to bridge the performance gap between artificial neural networks (ANNs) and SNNs. A foundation AI model acts as a teacher network, guiding smaller student SNNs organized into an ensemble, called Spiking Neural Ensemble (SNE). SNE enables the disentanglement of the teacher's knowledge, allowing each student to specialize in predicting a distinct aspect of it, while processing the same input. The core innovation of SNE is the adaptive activation of a subset of SNN models of an ensemble, leveraging knowledge-distillation, enhanced with an informed-partitioning (disentanglement) of the teacher's feature space. By dynamically activating only a subset of these student SNNs, the system balances accuracy and energy efficiency, achieving substantial energy savings with minimal accuracy loss. Moreover, SNE is significantly more efficient than the teacher network, reducing computational requirements by up to 20x with only a 2% drop in accuracy on the CIFAR-10 dataset. This disentanglement procedure achieves an accuracy improvement of up to 2.4% on the CIFAR-10 dataset compared to other partitioning schemes. Finally, we comparatively analyze SNE performance under noisy conditions, demonstrating enhanced robustness compared to its ANN teacher. In summary, SNE offers a promising new direction for energy-constrained applications.

Paper Structure

This paper contains 16 sections, 7 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Training a Student Ensemble. The input image is fed into the teacher network and into each student network. Each network computes a vector of features. The features of the students are concatenated and the Mean Square Error (MSE) is computed between the teacher's feature vector and the concatenated feature vector (obtained from the student ensemble). The concatenated feature vector of the ensemble is fed into a Linear classification head which produces an output vector that is used to calculate the Cross Entropy Loss from the ground truth vector for the example.
  • Figure 2: Fine-tuning of the Teacher Network. The teacher network is fine-tuned to naturally partition its feature space: in an online iterative manner, for each batch, the feature matrix is divided into $N$ clusters: a cluster includes a number of feature columns. After normalizing each cluster row, to increase the separability between clusters, we employ a loss metric based on the mean distances of each feature row of a cluster from the corresponding feature rows of all other clusters $L_{sim}$. The loss function in the fine-tuning of the teacher's network combines the loss for the primary classification task $L_{CE}$ and a negatively weighted $L_{sim}$, promoting better feature clustering while maintaining classification accuracy.
  • Figure 3: Overview of the Spiking ResNet architecture.
  • Figure 4: Overview of the Spiking VGG architecture.
  • Figure 5: Performance of VGG and ResNet Architectures on CIFAR-10. Architectures of two- and four-student ensembles achieve comparable performance to the single-student SNN with only a minor reduction in accuracy. The total AC operations, considering the entire ensemble, and evaluation accuracy of the two architectures using several students active during inference. The notation "X/Y" indicates the number of active students in an ensemble of Y students.
  • ...and 1 more figures