Table of Contents
Fetching ...

Workload-Balanced Pruning for Sparse Spiking Neural Networks

Ruokai Yin, Youngeun Kim, Yuhang Li, Abhishek Moitra, Nitin Satpute, Anna Hambitzer, Priyadarshini Panda

Abstract

Pruning for Spiking Neural Networks (SNNs) has emerged as a fundamental methodology for deploying deep SNNs on resource-constrained edge devices. Though the existing pruning methods can provide extremely high weight sparsity for deep SNNs, the high weight sparsity brings a workload imbalance problem. Specifically, the workload imbalance happens when a different number of non-zero weights are assigned to hardware units running in parallel. This results in low hardware utilization and thus imposes longer latency and higher energy costs. In preliminary experiments, we show that sparse SNNs (~98% weight sparsity) can suffer as low as ~59% utilization. To alleviate the workload imbalance problem, we propose u-Ticket, where we monitor and adjust the weight connections of the SNN during Lottery Ticket Hypothesis (LTH) based pruning, thus guaranteeing the final ticket gets optimal utilization when deployed onto the hardware. Experiments indicate that our u-Ticket can guarantee up to 100% hardware utilization, thus reducing up to 76.9% latency and 63.8% energy cost compared to the non-utilization-aware LTH method.

Workload-Balanced Pruning for Sparse Spiking Neural Networks

Abstract

Pruning for Spiking Neural Networks (SNNs) has emerged as a fundamental methodology for deploying deep SNNs on resource-constrained edge devices. Though the existing pruning methods can provide extremely high weight sparsity for deep SNNs, the high weight sparsity brings a workload imbalance problem. Specifically, the workload imbalance happens when a different number of non-zero weights are assigned to hardware units running in parallel. This results in low hardware utilization and thus imposes longer latency and higher energy costs. In preliminary experiments, we show that sparse SNNs (~98% weight sparsity) can suffer as low as ~59% utilization. To alleviate the workload imbalance problem, we propose u-Ticket, where we monitor and adjust the weight connections of the SNN during Lottery Ticket Hypothesis (LTH) based pruning, thus guaranteeing the final ticket gets optimal utilization when deployed onto the hardware. Experiments indicate that our u-Ticket can guarantee up to 100% hardware utilization, thus reducing up to 76.9% latency and 63.8% energy cost compared to the non-utilization-aware LTH method.
Paper Structure (31 sections, 3 equations, 14 figures, 6 tables, 1 algorithm)

This paper contains 31 sections, 3 equations, 14 figures, 6 tables, 1 algorithm.

Figures (14)

  • Figure 1: Comparison between u-Ticket and state-of-the-art workload balance methods. Overall, u-Ticket recovers the PE utilization up to $100\%$ for extremely sparse networks with 98% weight sparsity (here, we consider VGG-16). Please note that u-Ticket does not introduce any hardware area overhead and, thus, is the best fit for SNNs (↑: the higher is the better, ↓: the lower is the better).
  • Figure 2: Illustration of the concept of the proposed u-Ticket. Our u-Ticket consists of training (step1), pruning (step2), adjusting weight connections based on workload (step3), and re-initialization (step4). We repeat these steps for multiple rounds. Please note that the standard LTH method consists of training (step1), pruning (step2), and re-initialization (step4), which does not consider the utilization of the pruned SNNs.
  • Figure 3: Example utilization and latency resulted from imbalance and balanced workload under the same model sparsity. With the unstructured pruning, non-zero weights will have a random distribution across four groups, thus leading to unbalanced workloads across PEs, as shown on the left side (PE0 has four weights assigned, while PE1 and PE2 only have one).
  • Figure 4: Sparsity and utilization across pruning rounds for the standard LTH method without utilization awareness. The pruning is done for 13 rounds on VGG-16 being trained for image classification on CIFAR10 with 16 PEs.
  • Figure 5: Illustration of the weight sparsity pattern (WSP), the position ID (PID), and the (convolution ID) CID.
  • ...and 9 more figures