Table of Contents
Fetching ...

TT-SNN: Tensor Train Decomposition for Efficient Spiking Neural Network Training

Donghyun Lee, Ruokai Yin, Youngeun Kim, Abhishek Moitra, Yuhang Li, Priyadarshini Panda

TL;DR

This work introduces Tensor Train Decomposition for Spiking Neural Networks (TT-SNN), a method that reduces model size through trainable weight decomposition, resulting in reduced storage, FLOPs, and latency, and proposes a parallel computation pipeline as an alternative to the typical sequential tensor computation.

Abstract

Spiking Neural Networks (SNNs) have gained significant attention as a potentially energy-efficient alternative for standard neural networks with their sparse binary activation. However, SNNs suffer from memory and computation overhead due to spatio-temporal dynamics and multiple backpropagation computations across timesteps during training. To address this issue, we introduce Tensor Train Decomposition for Spiking Neural Networks (TT-SNN), a method that reduces model size through trainable weight decomposition, resulting in reduced storage, FLOPs, and latency. In addition, we propose a parallel computation pipeline as an alternative to the typical sequential tensor computation, which can be flexibly integrated into various existing SNN architectures. To the best of our knowledge, this is the first of its kind application of tensor decomposition in SNNs. We validate our method using both static and dynamic datasets, CIFAR10/100 and N-Caltech101, respectively. We also propose a TT-SNN-tailored training accelerator to fully harness the parallelism in TT-SNN. Our results demonstrate substantial reductions in parameter size (7.98X), FLOPs (9.25X), training time (17.7%), and training energy (28.3%) during training for the N-Caltech101 dataset, with negligible accuracy degradation.

TT-SNN: Tensor Train Decomposition for Efficient Spiking Neural Network Training

TL;DR

This work introduces Tensor Train Decomposition for Spiking Neural Networks (TT-SNN), a method that reduces model size through trainable weight decomposition, resulting in reduced storage, FLOPs, and latency, and proposes a parallel computation pipeline as an alternative to the typical sequential tensor computation.

Abstract

Spiking Neural Networks (SNNs) have gained significant attention as a potentially energy-efficient alternative for standard neural networks with their sparse binary activation. However, SNNs suffer from memory and computation overhead due to spatio-temporal dynamics and multiple backpropagation computations across timesteps during training. To address this issue, we introduce Tensor Train Decomposition for Spiking Neural Networks (TT-SNN), a method that reduces model size through trainable weight decomposition, resulting in reduced storage, FLOPs, and latency. In addition, we propose a parallel computation pipeline as an alternative to the typical sequential tensor computation, which can be flexibly integrated into various existing SNN architectures. To the best of our knowledge, this is the first of its kind application of tensor decomposition in SNNs. We validate our method using both static and dynamic datasets, CIFAR10/100 and N-Caltech101, respectively. We also propose a TT-SNN-tailored training accelerator to fully harness the parallelism in TT-SNN. Our results demonstrate substantial reductions in parameter size (7.98X), FLOPs (9.25X), training time (17.7%), and training energy (28.3%) during training for the N-Caltech101 dataset, with negligible accuracy degradation.
Paper Structure (10 sections, 6 equations, 5 figures, 4 tables, 1 algorithm)

This paper contains 10 sections, 6 equations, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: Illustration of TT-SNN modules. The order of weight information is followed according to the pytorch framework, i.e., (output channel, input channel, kernel size, kernel size) (a) Basic convolution weights with 3$\times$3 kernel. (b) Sequential computation of TT-cores is considered a traditional method with asymmetric kernels. (c) Proposed Parallel TT-module (PTT). Two asymmetric kernels are computed in parallel with the output of the first sub-convolution. The parallel computation of PTT can be seen as 3$\times$3 without the four corner values.
  • Figure 2: Illustration of Half TT (HTT) format for further compression. (a) Instead of sharing all weights through timesteps, HTT uses partial parts of sub-convolutions. (b) In the spatio-temporal computation dimension of SNN, the HTT module takes up a half-diagonal area due to its partial usage of weights through timestep.
  • Figure 3: Illustration of the design of our training accelerator for efficiently mapping the PTT-SNN and HTT-SNN. MemP denotes the membrane potential.
  • Figure 4: (a) Training energy costs of STT, PTT, and HTT-based SNNs compared to the baseline SNN on ResNet18 and ResNet34. The results are calculated based on the accelerator design of sata. (b) The training energy cost improvements of PTT and HTT compared to STT on our proposed multi-cluster accelerator design.
  • Figure 5: Performance trends according to the timesteps. (a) Accuracy and (b) training time between STT, PTT, and HTT during the training process.