PointABM:Integrating Bidirectional State Space Model with Multi-Head Self-Attention for Point Cloud Analysis
Jia-wei Chen, Yu-jie Xiong, Yong-bin Gao
TL;DR
PointABM tackles the challenge of efficient and effective 3D point cloud analysis by integrating a Transformer-based encoder with a Bidirectional State Space Model (Bi-SSM) runtime, preserving input order while enriching global context. The architecture first processes patches with a Transformer and then refines them through a Bi-SSM stack, enabling joint local-detail and global-structure modeling with near-linear complexity. Ablation studies confirm the value of both Transformer embedding and Bi-SSM embedding, showing consistent accuracy gains on ScanObjectNN and ModelNet40, including a 93.1% accuracy on ModelNet40 and notable improvements over strong Transformer baselines and the prior Mamba approach. The method is pretrained using a masked autoencoder strategy and achieves strong practical impact by balancing expressive power with computational efficiency for point cloud classification tasks, making it suitable for robotics and autonomous systems where real-time 3D processing is critical. $O(n^2 d)$ complexity for pure attention is mitigated by the Bi-SSM pathway, enabling scalable performance while maintaining robust feature extraction across unordered 3D data.
Abstract
Mamba, based on state space model (SSM) with its linear complexity and great success in classification provide its superiority in 3D point cloud analysis. Prior to that, Transformer has emerged as one of the most prominent and successful architectures for point cloud analysis. We present PointABM, a hybrid model that integrates the Mamba and Transformer architectures for enhancing local feature to improve performance of 3D point cloud analysis. In order to enhance the extraction of global features, we introduce a bidirectional SSM (bi-SSM) framework, which comprises both a traditional token forward SSM and an innovative backward SSM. To enhance the bi-SSM's capability of capturing more comprehensive features without disrupting the sequence relationships required by the bidirectional Mamba, we introduce Transformer, utilizing its self-attention mechanism to process point clouds. Extensive experimental results demonstrate that integrating Mamba with Transformer significantly enhance the model's capability to analysis 3D point cloud.
