Tensor Decomposition Based Attention Module for Spiking Neural Networks

Haoyu Deng; Ruijie Zhu; Xuerui Qiu; Yule Duan; Malu Zhang; Liangjian Deng

Tensor Decomposition Based Attention Module for Spiking Neural Networks

Haoyu Deng, Ruijie Zhu, Xuerui Qiu, Yule Duan, Malu Zhang, Liangjian Deng

TL;DR

This work tackles the inefficiency of conventional attention in spiking neural networks by introducing Projected Full Attention (PFA), a tensor-decomposition–based module that supports a tunable rank via the connecting factor R. PFA is built from Linear Projection of Spike Tensor (LPST) and Attention Map Composing (AMC), enabling temporal-channel-spatial attention with parameter growth that scales linearly with data. The authors provide both theoretical guidance on selecting R and extensive empirical validation showing state-of-the-art results on static and dynamic datasets, as well as improvements in neuromorphic image generation tasks. The approach offers a principled, efficient path to richer attention in SNNs and holds potential for deployment on neuromorphic hardware.

Abstract

The attention mechanism has been proven to be an effective way to improve spiking neural network (SNN). However, based on the fact that the current SNN input data flow is split into tensors to process on GPUs, none of the previous works consider the properties of tensors to implement an attention module. This inspires us to rethink current SNN from the perspective of tensor-relevant theories. Using tensor decomposition, we design the \textit{projected full attention} (PFA) module, which demonstrates excellent results with linearly growing parameters. Specifically, PFA is composed by the \textit{linear projection of spike tensor} (LPST) module and \textit{attention map composing} (AMC) module. In LPST, we start by compressing the original spike tensor into three projected tensors using a single property-preserving strategy with learnable parameters for each dimension. Then, in AMC, we exploit the inverse procedure of the tensor decomposition process to combine the three tensors into the attention map using a so-called connecting factor. To validate the effectiveness of the proposed PFA module, we integrate it into the widely used VGG and ResNet architectures for classification tasks. Our method achieves state-of-the-art performance on both static and dynamic benchmark datasets, surpassing the existing SNN models with Transformer-based and CNN-based backbones.

Tensor Decomposition Based Attention Module for Spiking Neural Networks

TL;DR

Abstract

Paper Structure (20 sections, 16 equations, 11 figures, 4 tables)

This paper contains 20 sections, 16 equations, 11 figures, 4 tables.

Introduction
Related Works
Motivation and Method
Motivation
Projected Full-Attention (PFA)
Linear Projection of Spike Tensor (LPST)
Attention Map Composing (AMC)
Parameter and Computational Cost Analysis
Theoretical Analysis on $R$
Experiment
Datasets and Training Details
Datasets
Loss function
Network Architectures
Comparison with Existing SOTA Works
...and 5 more sections

Figures (11)

Figure 1: Accuracy on CIFAR10 (left) and CIFAR100 (right). Compared with other methods, PFA significantly improves network performance.
Figure 2: A simple comparison curve of parameter quantity growth among TCJA TCJA, TA-SNN TASNN, and PFA. The parameter scale of PFA increases linearly. In this figure, the channel number is fixed at 128.
Figure 3: The detailed workflow of PFA. The input tensor is first sent to Linear Projection of Spike Tensor (LPST) module to generate three projections and split the three projections into corresponding vectors. In Attention Map Composing (AMC) module, these vectors are composed into the final attention map through the inverse process of CP decomposition. The attention is fused with the input tensor to obtain the refined tensor by Hadamard product.
Figure 4: Comparison of accuracy between the vanilla VGG network and VGG network with PFA modules on training and validation sets. While the training set accuracy remains consistently high for both models, the validation set accuracy shows a significant improvement with the addition of PFA modules, suggesting its efficacy in mitigating overfitting.
Figure 5: A toy example of the effect of different ranks on the approximation outcome. error is measured under $\ell_2$ norm. The norm starts to cover when the rank exceeds 30.
...and 6 more figures

Tensor Decomposition Based Attention Module for Spiking Neural Networks

TL;DR

Abstract

Tensor Decomposition Based Attention Module for Spiking Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (11)