Temporal-adaptive Weight Quantization for Spiking Neural Networks

Han Zhang; Qingyan Meng; Jiaqi Wang; Baiyu Chen; Zhengyu Ma; Xiaopeng Fan

Temporal-adaptive Weight Quantization for Spiking Neural Networks

Han Zhang, Qingyan Meng, Jiaqi Wang, Baiyu Chen, Zhengyu Ma, Xiaopeng Fan

TL;DR

TaWQ introduces temporally adaptive weight quantization for spiking neural networks, inspired by astrocyte-mediated synaptic modulation. By embedding calcium-dynamics-inspired updates, full-precision weights are mapped to time-varying $1.58$-bit ternary weights ${+1,0,-1}$ across timesteps, enabling excitatory, inhibitory, and asynaptic states with a shared temporal scaling. Across ImageNet, CIFAR, and neuromorphic datasets, TaWQ achieves substantial energy savings (often below 1 mJ) with negligible accuracy loss (often <1%), and its weight distributions approach near-maximum information entropy, indicating full use of the ternary weight capacity. The approach extends to multi-bit variants (mTaWQ) and demonstrates favorable comparisons to post-training quantization, while maintaining compatibility with non-Transformer spiking architectures and SHD speech tasks. Overall, TaWQ offers a principled, biology-inspired route to ultra-low-bit SNN quantization with strong practical implications for energy-efficient neuromorphic hardware.

Abstract

Weight quantization in spiking neural networks (SNNs) could further reduce energy consumption. However, quantizing weights without sacrificing accuracy remains challenging. In this study, inspired by astrocyte-mediated synaptic modulation in the biological nervous systems, we propose Temporal-adaptive Weight Quantization (TaWQ), which incorporates weight quantization with temporal dynamics to adaptively allocate ultra-low-bit weights along the temporal dimension. Extensive experiments on static (e.g., ImageNet) and neuromorphic (e.g., CIFAR10-DVS) datasets demonstrate that our TaWQ maintains high energy efficiency (4.12M, 0.63mJ) while incurring a negligible quantization loss of only 0.22% on ImageNet.

Temporal-adaptive Weight Quantization for Spiking Neural Networks

TL;DR

-bit ternary weights

across timesteps, enabling excitatory, inhibitory, and asynaptic states with a shared temporal scaling. Across ImageNet, CIFAR, and neuromorphic datasets, TaWQ achieves substantial energy savings (often below 1 mJ) with negligible accuracy loss (often <1%), and its weight distributions approach near-maximum information entropy, indicating full use of the ternary weight capacity. The approach extends to multi-bit variants (mTaWQ) and demonstrates favorable comparisons to post-training quantization, while maintaining compatibility with non-Transformer spiking architectures and SHD speech tasks. Overall, TaWQ offers a principled, biology-inspired route to ultra-low-bit SNN quantization with strong practical implications for energy-efficient neuromorphic hardware.

Temporal-adaptive Weight Quantization for Spiking Neural Networks

TL;DR

Abstract

Temporal-adaptive Weight Quantization for Spiking Neural Networks

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)