PointABM:Integrating Bidirectional State Space Model with Multi-Head Self-Attention for Point Cloud Analysis

Jia-wei Chen; Yu-jie Xiong; Yong-bin Gao

PointABM:Integrating Bidirectional State Space Model with Multi-Head Self-Attention for Point Cloud Analysis

Jia-wei Chen, Yu-jie Xiong, Yong-bin Gao

TL;DR

PointABM tackles the challenge of efficient and effective 3D point cloud analysis by integrating a Transformer-based encoder with a Bidirectional State Space Model (Bi-SSM) runtime, preserving input order while enriching global context. The architecture first processes patches with a Transformer and then refines them through a Bi-SSM stack, enabling joint local-detail and global-structure modeling with near-linear complexity. Ablation studies confirm the value of both Transformer embedding and Bi-SSM embedding, showing consistent accuracy gains on ScanObjectNN and ModelNet40, including a 93.1% accuracy on ModelNet40 and notable improvements over strong Transformer baselines and the prior Mamba approach. The method is pretrained using a masked autoencoder strategy and achieves strong practical impact by balancing expressive power with computational efficiency for point cloud classification tasks, making it suitable for robotics and autonomous systems where real-time 3D processing is critical. $O(n^2 d)$ complexity for pure attention is mitigated by the Bi-SSM pathway, enabling scalable performance while maintaining robust feature extraction across unordered 3D data.

Abstract

Mamba, based on state space model (SSM) with its linear complexity and great success in classification provide its superiority in 3D point cloud analysis. Prior to that, Transformer has emerged as one of the most prominent and successful architectures for point cloud analysis. We present PointABM, a hybrid model that integrates the Mamba and Transformer architectures for enhancing local feature to improve performance of 3D point cloud analysis. In order to enhance the extraction of global features, we introduce a bidirectional SSM (bi-SSM) framework, which comprises both a traditional token forward SSM and an innovative backward SSM. To enhance the bi-SSM's capability of capturing more comprehensive features without disrupting the sequence relationships required by the bidirectional Mamba, we introduce Transformer, utilizing its self-attention mechanism to process point clouds. Extensive experimental results demonstrate that integrating Mamba with Transformer significantly enhance the model's capability to analysis 3D point cloud.

PointABM:Integrating Bidirectional State Space Model with Multi-Head Self-Attention for Point Cloud Analysis

TL;DR

complexity for pure attention is mitigated by the Bi-SSM pathway, enabling scalable performance while maintaining robust feature extraction across unordered 3D data.

Abstract

Paper Structure (16 sections, 4 equations, 2 figures, 4 tables)

This paper contains 16 sections, 4 equations, 2 figures, 4 tables.

Introduction
Related work
Point Cloud Transformers
State Space Models
POINTABM
Overall
Transformer Block
Bidirectional State Space Block
Experiments
Implementation Details
Classfication Tasks
ScanObjectNN
ModelNet40
Ablation study
Transformer embedding
...and 1 more sections

Figures (2)

Figure 1: The pipeline of PointABM. We initially employ FPS and KNN to extract keypoints and segment them into patches from the input point cloud. Then sent them into Transformer Encoder. Finally, the encoded features are loaded into a Mamba Encoder composed of N bidirectional Mambas.
Figure 2: (a) Transformer Block, (b) Bidirectional State Space Block.

PointABM:Integrating Bidirectional State Space Model with Multi-Head Self-Attention for Point Cloud Analysis

TL;DR

Abstract

PointABM:Integrating Bidirectional State Space Model with Multi-Head Self-Attention for Point Cloud Analysis

Authors

TL;DR

Abstract

Table of Contents

Figures (2)