Table of Contents
Fetching ...

Memory-Free and Parallel Computation for Quantized Spiking Neural Networks

Dehao Zhang, Shuai Wang, Yichen Xiao, Wenjie Wei, Yimeng Shan, Malu Zhang, Yang Yang

TL;DR

This work tackles the performance drop in quantized spiking neural networks by identifying loss of historical information as the critical bottleneck when membrane potentials are low-bit quantified. It introduces memory-free quantization to preserve full spatio-temporal history without storing membrane potentials, and pairs it with a parallel training plus asynchronous inference framework to accelerate computation. The resulting MFP-QSNN delivers state-of-the-art or competitive accuracy on static and neuromorphic datasets while drastically reducing memory usage and boosting training speed, underscoring its potential for energy-efficient edge neuromorphic computing. The approach offers a practical pathway to deploy high-performance QSNNs on resource-limited devices using memory-efficient dynamics and fast, parallelizable training.

Abstract

Quantized Spiking Neural Networks (QSNNs) offer superior energy efficiency and are well-suited for deployment on resource-limited edge devices. However, limited bit-width weight and membrane potential result in a notable performance decline. In this study, we first identify a new underlying cause for this decline: the loss of historical information due to the quantized membrane potential. To tackle this issue, we introduce a memory-free quantization method that captures all historical information without directly storing membrane potentials, resulting in better performance with less memory requirements. To further improve the computational efficiency, we propose a parallel training and asynchronous inference framework that greatly increases training speed and energy efficiency. We combine the proposed memory-free quantization and parallel computation methods to develop a high-performance and efficient QSNN, named MFP-QSNN. Extensive experiments show that our MFP-QSNN achieves state-of-the-art performance on various static and neuromorphic image datasets, requiring less memory and faster training speeds. The efficiency and efficacy of the MFP-QSNN highlight its potential for energy-efficient neuromorphic computing.

Memory-Free and Parallel Computation for Quantized Spiking Neural Networks

TL;DR

This work tackles the performance drop in quantized spiking neural networks by identifying loss of historical information as the critical bottleneck when membrane potentials are low-bit quantified. It introduces memory-free quantization to preserve full spatio-temporal history without storing membrane potentials, and pairs it with a parallel training plus asynchronous inference framework to accelerate computation. The resulting MFP-QSNN delivers state-of-the-art or competitive accuracy on static and neuromorphic datasets while drastically reducing memory usage and boosting training speed, underscoring its potential for energy-efficient edge neuromorphic computing. The approach offers a practical pathway to deploy high-performance QSNNs on resource-limited devices using memory-efficient dynamics and fast, parallelizable training.

Abstract

Quantized Spiking Neural Networks (QSNNs) offer superior energy efficiency and are well-suited for deployment on resource-limited edge devices. However, limited bit-width weight and membrane potential result in a notable performance decline. In this study, we first identify a new underlying cause for this decline: the loss of historical information due to the quantized membrane potential. To tackle this issue, we introduce a memory-free quantization method that captures all historical information without directly storing membrane potentials, resulting in better performance with less memory requirements. To further improve the computational efficiency, we propose a parallel training and asynchronous inference framework that greatly increases training speed and energy efficiency. We combine the proposed memory-free quantization and parallel computation methods to develop a high-performance and efficient QSNN, named MFP-QSNN. Extensive experiments show that our MFP-QSNN achieves state-of-the-art performance on various static and neuromorphic image datasets, requiring less memory and faster training speeds. The efficiency and efficacy of the MFP-QSNN highlight its potential for energy-efficient neuromorphic computing.

Paper Structure

This paper contains 13 sections, 9 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Limited spatio-temporal interaction: (a) The performance under different bit-widths weights and membrane potentials. (b) The proportion of spike inputs and residual membrane voltages under different bit-widths.
  • Figure 2: Comparative analysis of dynamics between LIF neurons and our MFP-QSNN. During the training phase, $M_{\tau}$ ensures that MFP-QSNN supports parallel training without the explicit need to store U[t-1] for historical information exchange. In the inference phase, $M_{\tau}$ is integrated into the $V_{th}$ at each timestep, maintaining asynchronous inference characteristics.
  • Figure 3: A Comparative Analysis of Model Memory Requirements and Performance Across Various Methods on the CIFAR-10 Dataset: Our method attains a recognition accuracy of 95.9% while 1.48MB memory footprints.