Table of Contents
Fetching ...

MD-SNN: Membrane Potential-aware Distillation on Quantized Spiking Neural Network

Donghyun Lee, Abhishek Moitra, Youngeun Kim, Ruokai Yin, Priyadarshini Panda

TL;DR

MD-SNN addresses accuracy loss in quantized spiking networks by transferring membrane potential distributions from a full-precision teacher to quantized students. It introduces membrane-aware distillation with dual pathways (membrane potentials and logits) and a versatile teacher framework that supports multiple timesteps from a single teacher. The approach achieves competitive accuracy on static and neuromorphic datasets and delivers substantial hardware efficiency gains on SpikeSim, including up to 14.85x EDAP reduction and improved energy-per-operation metrics. This enables flexible, energy-efficient deployment of quantized SNNs across varying latency-accuracy requirements without retraining multiple models.

Abstract

Spiking Neural Networks (SNNs) offer a promising and energy-efficient alternative to conventional neural networks, thanks to their sparse binary activation. However, they face challenges regarding memory and computation overhead due to complex spatio-temporal dynamics and the necessity for multiple backpropagation computations across timesteps during training. To mitigate this overhead, compression techniques such as quantization are applied to SNNs. Yet, naively applying quantization to SNNs introduces a mismatch in membrane potential, a crucial factor for the firing of spikes, resulting in accuracy degradation. In this paper, we introduce Membrane-aware Distillation on quantized Spiking Neural Network (MD-SNN), which leverages membrane potential to mitigate discrepancies after weight, membrane potential, and batch normalization quantization. To our knowledge, this study represents the first application of membrane potential knowledge distillation in SNNs. We validate our approach on various datasets, including CIFAR10, CIFAR100, N-Caltech101, and TinyImageNet, demonstrating its effectiveness for both static and dynamic data scenarios. Furthermore, for hardware efficiency, we evaluate the MD-SNN with SpikeSim platform, finding that MD-SNNs achieve 14.85X lower energy-delay-area product (EDAP), 2.64X higher TOPS/W, and 6.19X higher TOPS/mm2 compared to floating point SNNs at iso-accuracy on N-Caltech101 dataset.

MD-SNN: Membrane Potential-aware Distillation on Quantized Spiking Neural Network

TL;DR

MD-SNN addresses accuracy loss in quantized spiking networks by transferring membrane potential distributions from a full-precision teacher to quantized students. It introduces membrane-aware distillation with dual pathways (membrane potentials and logits) and a versatile teacher framework that supports multiple timesteps from a single teacher. The approach achieves competitive accuracy on static and neuromorphic datasets and delivers substantial hardware efficiency gains on SpikeSim, including up to 14.85x EDAP reduction and improved energy-per-operation metrics. This enables flexible, energy-efficient deployment of quantized SNNs across varying latency-accuracy requirements without retraining multiple models.

Abstract

Spiking Neural Networks (SNNs) offer a promising and energy-efficient alternative to conventional neural networks, thanks to their sparse binary activation. However, they face challenges regarding memory and computation overhead due to complex spatio-temporal dynamics and the necessity for multiple backpropagation computations across timesteps during training. To mitigate this overhead, compression techniques such as quantization are applied to SNNs. Yet, naively applying quantization to SNNs introduces a mismatch in membrane potential, a crucial factor for the firing of spikes, resulting in accuracy degradation. In this paper, we introduce Membrane-aware Distillation on quantized Spiking Neural Network (MD-SNN), which leverages membrane potential to mitigate discrepancies after weight, membrane potential, and batch normalization quantization. To our knowledge, this study represents the first application of membrane potential knowledge distillation in SNNs. We validate our approach on various datasets, including CIFAR10, CIFAR100, N-Caltech101, and TinyImageNet, demonstrating its effectiveness for both static and dynamic data scenarios. Furthermore, for hardware efficiency, we evaluate the MD-SNN with SpikeSim platform, finding that MD-SNNs achieve 14.85X lower energy-delay-area product (EDAP), 2.64X higher TOPS/W, and 6.19X higher TOPS/mm2 compared to floating point SNNs at iso-accuracy on N-Caltech101 dataset.

Paper Structure

This paper contains 23 sections, 5 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Impact of knowledge distillation on membrane potential distributions. (a) Before distillation: significant mismatch between Floating-Point (FP) and 4-bit quantized models. (b) After distillation: aligned distributions through MD-SNN. (c) Performance comparison on CIFAR100, where MD-SNN (green star) surpasses both the quantized student and FP teacher models.
  • Figure 2: MD-SNN architecture and training paradigm. (a) Dual distillation paths using membrane potentials and logits between FP teacher and quantized student. (b) Versatile teacher framework: single $T=4$ teacher guides multiple students via temporal membrane alignment, contrasted with traditional one-to-one knowledge distillation requiring separate teachers per timestep.
  • Figure 3: Hardware efficiency comparison of floating-point and MD-SNN approaches on the N-Caltech101 dataset. Lower EDAP values indicate better overall hardware efficiency, while higher TOPS/W and TOPS/mm² values indicate better energy and area efficiency, respectively. FP32 refers to full-precision, MS-N denotes MD-SNN with N-bit quantization for both weights and membrane potentials.
  • Figure 4: Teacher versatility on CIFAR-10. (a) Accuracy improvements of MD-SNN over quantized baselines using a single $T=4$ teacher for all timestep configurations. Quantized baselines (dashed line) are models trained with MINT yinmint for $t={1,2,3,4}$ timesteps individually without any distillation. MD-SNN results (solid line) undergo distillation. (b) Training FLOPs comparison showing 30% reduction with the versatile teacher approach. Traditional refers to individual timestep teacher-student distillation (see Fig. 2(b)).
  • Figure 5: Ablation study on membrane distillation granularity. (a) Three extraction strategies for membrane potential: Conv-wise (after each Conv-BN layer), Block-wise (after each residual block), and Group-wise (after each stage). (b) GPU memory consumption and (c) training time comparison across datasets. (d) Accuracy comparison on CIFAR-100 and N-Caltech101.
  • ...and 1 more figures