Toward Relative Positional Encoding in Spiking Transformers

Changze Lv; Yansen Wang; Dongqi Han; Yifei Shen; Xiaoqing Zheng; Xuanjing Huang; Dongsheng Li

Toward Relative Positional Encoding in Spiking Transformers

Changze Lv, Yansen Wang, Dongqi Han, Yifei Shen, Xiaoqing Zheng, Xuanjing Huang, Dongsheng Li

TL;DR

This work tackles the challenge of integrating relative positional information into spiking Transformers without violating binary spike dynamics. It introduces Gray-PE, leveraging a provable Gray-code distance property, and Log-PE, introducing a logarithmic distance bias, along with an extended 2D variant for image patches. The methods are implemented via XNOR-based self-attention and shown to yield consistent gains across time-series forecasting, text classification, and patch-based image classification on multiple SNN backbones, with theoretical and empirical support. The results suggest that relative positional encoding, when adapted to neuromorphic constraints, can significantly enhance sequential modeling capabilities in SNNs and broaden their applicability to real-world tasks.

Abstract

Spiking neural networks (SNNs) are bio-inspired networks that mimic how neurons in the brain communicate through discrete spikes, which have great potential in various tasks due to their energy efficiency and temporal processing capabilities. SNNs with self-attention mechanisms (spiking Transformers) have recently shown great advancements in various tasks, and inspired by traditional Transformers, several studies have demonstrated that spiking absolute positional encoding can help capture sequential relationships for input data, enhancing the capabilities of spiking Transformers for tasks such as sequential modeling and image classification. However, how to incorporate relative positional information into SNNs remains a challenge. In this paper, we introduce several strategies to approximate relative positional encoding in spiking Transformers while preserving the binary nature of spikes. Firstly, we formally prove that encoding relative distances with Gray Code ensures that the binary representations of positional indices maintain a constant Hamming distance whenever their decimal values differ by a power of two, and we propose Gray-PE based on this property. In addition, we propose another RPE method called Log-PE, which combines the logarithmic form of the relative distance matrix directly into the spiking attention map. Furthermore, we extend our RPE methods to a two-dimensional form, making them suitable for processing image patches. We evaluate our RPE methods on various tasks, including time series forecasting, text classification, and patch-based image classification, and the experimental results demonstrate a satisfying performance gain by incorporating our RPE methods across many architectures. Our results provide fresh perspectives on designing spiking Transformers to advance their sequential modeling capability, thereby expanding their applicability across various domains.

Toward Relative Positional Encoding in Spiking Transformers

TL;DR

Abstract

Toward Relative Positional Encoding in Spiking Transformers

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (3)