Table of Contents
Fetching ...

SQUAT: Stateful Quantization-Aware Training in Recurrent Spiking Neural Networks

Sreyes Venkatesh, Razvan Marinescu, Jason K. Eshraghian

TL;DR

Two QAT schemes for stateful neurons are introduced: (i) a uniform quantization strategy, an established method for weight quantization, and (ii) threshold-centered quantization, which allocates exponentially more quantization levels near the firing threshold.

Abstract

Weight quantization is used to deploy high-performance deep learning models on resource-limited hardware, enabling the use of low-precision integers for storage and computation. Spiking neural networks (SNNs) share the goal of enhancing efficiency, but adopt an 'event-driven' approach to reduce the power consumption of neural network inference. While extensive research has focused on weight quantization, quantization-aware training (QAT), and their application to SNNs, the precision reduction of state variables during training has been largely overlooked, potentially diminishing inference performance. This paper introduces two QAT schemes for stateful neurons: (i) a uniform quantization strategy, an established method for weight quantization, and (ii) threshold-centered quantization, which allocates exponentially more quantization levels near the firing threshold. Our results show that increasing the density of quantization levels around the firing threshold improves accuracy across several benchmark datasets. We provide an ablation analysis of the effects of weight and state quantization, both individually and combined, and how they impact models. Our comprehensive empirical evaluation includes full precision, 8-bit, 4-bit, and 2-bit quantized SNNs, using QAT, stateful QAT (SQUAT), and post-training quantization methods. The findings indicate that the combination of QAT and SQUAT enhance performance the most, but given the choice of one or the other, QAT improves performance by the larger degree. These trends are consistent all datasets. Our methods have been made available in our Python library snnTorch: https://github.com/jeshraghian/snntorch.

SQUAT: Stateful Quantization-Aware Training in Recurrent Spiking Neural Networks

TL;DR

Two QAT schemes for stateful neurons are introduced: (i) a uniform quantization strategy, an established method for weight quantization, and (ii) threshold-centered quantization, which allocates exponentially more quantization levels near the firing threshold.

Abstract

Weight quantization is used to deploy high-performance deep learning models on resource-limited hardware, enabling the use of low-precision integers for storage and computation. Spiking neural networks (SNNs) share the goal of enhancing efficiency, but adopt an 'event-driven' approach to reduce the power consumption of neural network inference. While extensive research has focused on weight quantization, quantization-aware training (QAT), and their application to SNNs, the precision reduction of state variables during training has been largely overlooked, potentially diminishing inference performance. This paper introduces two QAT schemes for stateful neurons: (i) a uniform quantization strategy, an established method for weight quantization, and (ii) threshold-centered quantization, which allocates exponentially more quantization levels near the firing threshold. Our results show that increasing the density of quantization levels around the firing threshold improves accuracy across several benchmark datasets. We provide an ablation analysis of the effects of weight and state quantization, both individually and combined, and how they impact models. Our comprehensive empirical evaluation includes full precision, 8-bit, 4-bit, and 2-bit quantized SNNs, using QAT, stateful QAT (SQUAT), and post-training quantization methods. The findings indicate that the combination of QAT and SQUAT enhance performance the most, but given the choice of one or the other, QAT improves performance by the larger degree. These trends are consistent all datasets. Our methods have been made available in our Python library snnTorch: https://github.com/jeshraghian/snntorch.
Paper Structure (18 sections, 10 equations, 4 figures, 7 tables)

This paper contains 18 sections, 10 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: A graphical depiction of stateful quantization. On the left, a membrane potential trajectory is depicted in full precision. The state can be quantized either via uniform or exponential quantization. A 3-bit (8 levels) quantization scheme is illustrated. In uniform quantization, the permissible levels are evenly distributed. In exponential quantization, the permissible levels are closer about the threshold, and widely distributed moving further away from the threshold. A 'straight through estimator' (STE) is also depicted to address the non-differentiability of quantization.
  • Figure 2: FashionMNIST performance. Top row: (i) 8/4/2-b uniformly distributed states across QAT (N-b weights, flt32 states), SQUAT (flt32 weights, N-b states), SQUAT+QAT (N-b weights, N-b states), and PTQ-S (flt32 weights, N-b states), PTQ-W (N-b weights, flt32 states), PTQ-W+S (N-b weights, N-b states). (ii) SQUAT vs PTQ uniformly and exponentially distributed states (SQUAT and QAT are both applied across N-b weights and states, and compared against PTQ of N-b states and weights). Bottom row: (iii) 8/4/2-b exponentially distributed states across QAT (N-b weights, flt32 states), SQUAT (flt32 weights, N-b states), SQUAT+QAT (N-b weights, N-b states), and PTQ-S (flt32 weights, N-b states), PTQ-W (N-b weights, flt32 states), PTQ-W+S (N-b weights, N-b states). (iv) Comparison between exponential and uniformly distributed states: SQUAT+QAT are used across N-b states and weights, then N-b PTQ is used across N-b states and weights.
  • Figure 3: SHD performance. Top row: (i) 8/4/2-b uniformly distributed states across QAT (N-b weights, flt32 states), SQUAT (flt32 weights, N-b states), SQUAT+QAT (N-b weights, N-b states), and PTQ-S (flt32 weights, N-b states), PTQ-W (N-b weights, flt32 states), PTQ-W+S (N-b weights, N-b states). (ii) SQUAT vs PTQ uniformly and exponentially distributed states (SQUAT and QAT are both applied across N-b weights and states, and compared against PTQ of N-b states and weights). Bottom row: (iii) 8/4/2-b exponentially distributed states across QAT (N-b weights, flt32 states), SQUAT (flt32 weights, N-b states), SQUAT+QAT (N-b weights, N-b states), and PTQ-S (flt32 weights, N-b states), PTQ-W (N-b weights, flt32 states), PTQ-W+S (N-b weights, N-b states). (iv) Comparison between exponential and uniformly distributed states: SQUAT+QAT are used across N-b states and weights, then N-b PTQ is used across N-b states and weights.
  • Figure 4: DVS Gesture Dataset performance. Top row: (i) 8/4/2-b uniformly distributed states across QAT (N-b weights, flt32 states), SQUAT (flt32 weights, N-b states), SQUAT+QAT (N-b weights, N-b states), and PTQ-S (flt32 weights, N-b states), PTQ-W (N-b weights, flt32 states), PTQ-W+S (N-b weights, N-b states). (ii) SQUAT vs PTQ uniformly and exponentially distributed states (SQUAT and QAT are both applied across N-b weights and states, and compared against PTQ of N-b states and weights). Bottom row: (iii) 8/4/2-b exponentially distributed states across QAT (N-b weights, flt32 states), SQUAT (flt32 weights, N-b states), SQUAT+QAT (N-b weights, N-b states), and PTQ-S (flt32 weights, N-b states), PTQ-W (N-b weights, flt32 states), PTQ-W+S (N-b weights, N-b states). (iv) Comparison between exponential and uniformly distributed states: SQUAT+QAT are used across N-b states and weights, then N-b PTQ is used across N-b states and weights.