Table of Contents
Fetching ...

Spiking Point Transformer for Point Cloud Classification

Peixi Wu, Bosong Chai, Hebei Li, Menghua Zheng, Yansong Peng, Zeyu Wang, Xuan Nie, Yueyi Zhang, Xiaoyan Sun

TL;DR

This work presents Spiking Point Transformer (SPT), a transformer-based SNN for 3D point cloud classification that introduces Queue-Driven Sampling Direct Encoding (Q-SDE) and Hybrid Dynamics Integrate-and-Fire (HD-IF) to achieve high accuracy with substantial energy savings. The architecture integrates spiking local attention and neighbor interactions via a Spiking Point Encoder Module, enabling multi-time-step processing with reduced memory and computation. On ModelNet10/40 and ScanObjectNN, SPT achieves state-of-the-art SNN performance and demonstrates meaningful energy efficiency, including at least a $6.4\times$ reduction compared to ANNs. These results highlight the viability of energy-efficient neuromorphic approaches for 3D perception and inform future hardware design for point-cloud processing.

Abstract

Spiking Neural Networks (SNNs) offer an attractive and energy-efficient alternative to conventional Artificial Neural Networks (ANNs) due to their sparse binary activation. When SNN meets Transformer, it shows great potential in 2D image processing. However, their application for 3D point cloud remains underexplored. To this end, we present Spiking Point Transformer (SPT), the first transformer-based SNN framework for point cloud classification. Specifically, we first design Queue-Driven Sampling Direct Encoding for point cloud to reduce computational costs while retaining the most effective support points at each time step. We introduce the Hybrid Dynamics Integrate-and-Fire Neuron (HD-IF), designed to simulate selective neuron activation and reduce over-reliance on specific artificial neurons. SPT attains state-of-the-art results on three benchmark datasets that span both real-world and synthetic datasets in the SNN domain. Meanwhile, the theoretical energy consumption of SPT is at least 6.4$\times$ less than its ANN counterpart.

Spiking Point Transformer for Point Cloud Classification

TL;DR

This work presents Spiking Point Transformer (SPT), a transformer-based SNN for 3D point cloud classification that introduces Queue-Driven Sampling Direct Encoding (Q-SDE) and Hybrid Dynamics Integrate-and-Fire (HD-IF) to achieve high accuracy with substantial energy savings. The architecture integrates spiking local attention and neighbor interactions via a Spiking Point Encoder Module, enabling multi-time-step processing with reduced memory and computation. On ModelNet10/40 and ScanObjectNN, SPT achieves state-of-the-art SNN performance and demonstrates meaningful energy efficiency, including at least a reduction compared to ANNs. These results highlight the viability of energy-efficient neuromorphic approaches for 3D perception and inform future hardware design for point-cloud processing.

Abstract

Spiking Neural Networks (SNNs) offer an attractive and energy-efficient alternative to conventional Artificial Neural Networks (ANNs) due to their sparse binary activation. When SNN meets Transformer, it shows great potential in 2D image processing. However, their application for 3D point cloud remains underexplored. To this end, we present Spiking Point Transformer (SPT), the first transformer-based SNN framework for point cloud classification. Specifically, we first design Queue-Driven Sampling Direct Encoding for point cloud to reduce computational costs while retaining the most effective support points at each time step. We introduce the Hybrid Dynamics Integrate-and-Fire Neuron (HD-IF), designed to simulate selective neuron activation and reduce over-reliance on specific artificial neurons. SPT attains state-of-the-art results on three benchmark datasets that span both real-world and synthetic datasets in the SNN domain. Meanwhile, the theoretical energy consumption of SPT is at least 6.4 less than its ANN counterpart.

Paper Structure

This paper contains 24 sections, 3 equations, 4 figures, 6 tables, 1 algorithm.

Figures (4)

  • Figure 1: The overview of Spiking Point Transformer (SPT), which consists of Queue-Driven Sampling Direct Encoding (Q- SDE), MLP Module for adaptive learning, Spiking Point Encoder Module for feature interaction and Classification Head.
  • Figure 2: (a) The main structure of HD-IF integrating neuronal membrane potential and firing. (b) The membrane potential of different neurons with 0.4 input and 0.5 threshold.
  • Figure 3: Visualization of support points and points at each time step. Support points repeated across most time steps capture the essence of the object shape. Blue points are the enqueue points while red points are the dequeue points.
  • Figure 4: Visualization of selectively activated neurons on different datasets.The solid line shows the most frequently Top-1 activated neurons while the dashed line shows the most frequently Top-2 activated neurons.