Table of Contents
Fetching ...

Temporal-adaptive Weight Quantization for Spiking Neural Networks

Han Zhang, Qingyan Meng, Jiaqi Wang, Baiyu Chen, Zhengyu Ma, Xiaopeng Fan

TL;DR

TaWQ introduces temporally adaptive weight quantization for spiking neural networks, inspired by astrocyte-mediated synaptic modulation. By embedding calcium-dynamics-inspired updates, full-precision weights are mapped to time-varying $1.58$-bit ternary weights ${+1,0,-1}$ across timesteps, enabling excitatory, inhibitory, and asynaptic states with a shared temporal scaling. Across ImageNet, CIFAR, and neuromorphic datasets, TaWQ achieves substantial energy savings (often below 1 mJ) with negligible accuracy loss (often <1%), and its weight distributions approach near-maximum information entropy, indicating full use of the ternary weight capacity. The approach extends to multi-bit variants (mTaWQ) and demonstrates favorable comparisons to post-training quantization, while maintaining compatibility with non-Transformer spiking architectures and SHD speech tasks. Overall, TaWQ offers a principled, biology-inspired route to ultra-low-bit SNN quantization with strong practical implications for energy-efficient neuromorphic hardware.

Abstract

Weight quantization in spiking neural networks (SNNs) could further reduce energy consumption. However, quantizing weights without sacrificing accuracy remains challenging. In this study, inspired by astrocyte-mediated synaptic modulation in the biological nervous systems, we propose Temporal-adaptive Weight Quantization (TaWQ), which incorporates weight quantization with temporal dynamics to adaptively allocate ultra-low-bit weights along the temporal dimension. Extensive experiments on static (e.g., ImageNet) and neuromorphic (e.g., CIFAR10-DVS) datasets demonstrate that our TaWQ maintains high energy efficiency (4.12M, 0.63mJ) while incurring a negligible quantization loss of only 0.22% on ImageNet.

Temporal-adaptive Weight Quantization for Spiking Neural Networks

TL;DR

TaWQ introduces temporally adaptive weight quantization for spiking neural networks, inspired by astrocyte-mediated synaptic modulation. By embedding calcium-dynamics-inspired updates, full-precision weights are mapped to time-varying -bit ternary weights across timesteps, enabling excitatory, inhibitory, and asynaptic states with a shared temporal scaling. Across ImageNet, CIFAR, and neuromorphic datasets, TaWQ achieves substantial energy savings (often below 1 mJ) with negligible accuracy loss (often <1%), and its weight distributions approach near-maximum information entropy, indicating full use of the ternary weight capacity. The approach extends to multi-bit variants (mTaWQ) and demonstrates favorable comparisons to post-training quantization, while maintaining compatibility with non-Transformer spiking architectures and SHD speech tasks. Overall, TaWQ offers a principled, biology-inspired route to ultra-low-bit SNN quantization with strong practical implications for energy-efficient neuromorphic hardware.

Abstract

Weight quantization in spiking neural networks (SNNs) could further reduce energy consumption. However, quantizing weights without sacrificing accuracy remains challenging. In this study, inspired by astrocyte-mediated synaptic modulation in the biological nervous systems, we propose Temporal-adaptive Weight Quantization (TaWQ), which incorporates weight quantization with temporal dynamics to adaptively allocate ultra-low-bit weights along the temporal dimension. Extensive experiments on static (e.g., ImageNet) and neuromorphic (e.g., CIFAR10-DVS) datasets demonstrate that our TaWQ maintains high energy efficiency (4.12M, 0.63mJ) while incurring a negligible quantization loss of only 0.22% on ImageNet.

Paper Structure

This paper contains 33 sections, 21 equations, 9 figures, 13 tables.

Figures (9)

  • Figure 1: Schematic illustration of tripartite synapses. (a) An excitatory or inhibitory presynaptic neuron and a postsynaptic neuron, together with an astrocyte, form a tripartite synapse. (b) Schematic of the tripartite synapse structure, $\mathbf{W}_{syn}$ is the synaptic strength, $\mathbf{I}$ represents the stimulus received by astrocytes, $\mathbf{Ca}^{2+}$ denotes the calcium concentration, $\mathbf{M}$ is a symbol (not spike) designating whether astrocytes modulate synapses. (c) Calcium dynamics curve, triggering the $\mathbf{M}$ upon exceeding the threshold. (d) The synaptic strength varies over time under the modulation of astrocytes.
  • Figure 2: Schematic illustration of TaWQ. (a) Weights are quantized into time-varying 1.58-bit values {+1, 0, -1}, followed by temporal-wise operation. (b) The state diagram in the weight quantization process, $\mathbf{I}_{n}$, $\mathbf{C}_s$, and $\mathbf{W}_{tri,q}$ are the normalized stimulus, intermediate variable, and quantized weight, respectively.
  • Figure 3: Curve of the quantization function and surrogate gradient, "Cs" on the horizontal axis is $\mathbf{C}_s$ in Eq. (\ref{['singlebittawq']}). (a) The quantization function converts floating-point values into 1.58-bit ternary values $\{+1, 0, -1\}$. (b) Surrogate gradient under varying thresholds $C_{th}$.
  • Figure 4: Attention maps of the full-precision model and the 1.58-bit quantized model with TaWQ.The images are part of ImageNet's validation set.
  • Figure 5: Information entropy and weight proportion of TaWQ-quantized QKFormer. 'Pp', 'Pz', and 'Pn' represent the probabilities of +1, 0, and -1 in the weight, respectively. The pink dashed line denotes the optimum.
  • ...and 4 more figures