Table of Contents
Fetching ...

Point Cloud Mamba: Point Cloud Learning via State Space Model

Tao Zhang, Haobo Yuan, Lu Qi, Jiangning Zhang, Qianyu Zhou, Shunping Ji, Shuicheng Yan, Xiangtai Li

TL;DR

This work introduces Point Cloud Mamba (PCM), a Mamba-based framework for global point-cloud modeling that achieves state-of-the-art results by combining Consistent Traverse Serialization (CTS), multi-variant sequence views, and order prompts with a spatially informed positional encoding. PCM enables linear-time inference while capturing long-range dependencies across 3D points, surpassing both point-based and transformer-based SOTA methods on ScanObjectNN, ModelNet40, ShapeNetPart, and S3DIS. Key innovations—CTS variants, prompt-based sequence awareness, and spatial embedding—enable effective cross-point interactions and robust feature learning. When paired with stronger local feature extractors, PCM attains further gains on challenging scenes, demonstrating practical impact for 3D understanding and segmentation tasks.

Abstract

Recently, state space models have exhibited strong global modeling capabilities and linear computational complexity in contrast to transformers. This research focuses on applying such architecture to more efficiently and effectively model point cloud data globally with linear computational complexity. In particular, for the first time, we demonstrate that Mamba-based point cloud methods can outperform previous methods based on transformer or multi-layer perceptrons (MLPs). To enable Mamba to process 3-D point cloud data more effectively, we propose a novel Consistent Traverse Serialization method to convert point clouds into 1-D point sequences while ensuring that neighboring points in the sequence are also spatially adjacent. Consistent Traverse Serialization yields six variants by permuting the order of \textit{x}, \textit{y}, and \textit{z} coordinates, and the synergistic use of these variants aids Mamba in comprehensively observing point cloud data. Furthermore, to assist Mamba in handling point sequences with different orders more effectively, we introduce point prompts to inform Mamba of the sequence's arrangement rules. Finally, we propose positional encoding based on spatial coordinate mapping to inject positional information into point cloud sequences more effectively. Point Cloud Mamba surpasses the state-of-the-art (SOTA) point-based method PointNeXt and achieves new SOTA performance on the ScanObjectNN, ModelNet40, ShapeNetPart, and S3DIS datasets. It is worth mentioning that when using a more powerful local feature extraction module, our PCM achieves 79.6 mIoU on S3DIS, significantly surpassing the previous SOTA models, DeLA and PTv3, by 5.5 mIoU and 4.9 mIoU, respectively.

Point Cloud Mamba: Point Cloud Learning via State Space Model

TL;DR

This work introduces Point Cloud Mamba (PCM), a Mamba-based framework for global point-cloud modeling that achieves state-of-the-art results by combining Consistent Traverse Serialization (CTS), multi-variant sequence views, and order prompts with a spatially informed positional encoding. PCM enables linear-time inference while capturing long-range dependencies across 3D points, surpassing both point-based and transformer-based SOTA methods on ScanObjectNN, ModelNet40, ShapeNetPart, and S3DIS. Key innovations—CTS variants, prompt-based sequence awareness, and spatial embedding—enable effective cross-point interactions and robust feature learning. When paired with stronger local feature extractors, PCM attains further gains on challenging scenes, demonstrating practical impact for 3D understanding and segmentation tasks.

Abstract

Recently, state space models have exhibited strong global modeling capabilities and linear computational complexity in contrast to transformers. This research focuses on applying such architecture to more efficiently and effectively model point cloud data globally with linear computational complexity. In particular, for the first time, we demonstrate that Mamba-based point cloud methods can outperform previous methods based on transformer or multi-layer perceptrons (MLPs). To enable Mamba to process 3-D point cloud data more effectively, we propose a novel Consistent Traverse Serialization method to convert point clouds into 1-D point sequences while ensuring that neighboring points in the sequence are also spatially adjacent. Consistent Traverse Serialization yields six variants by permuting the order of \textit{x}, \textit{y}, and \textit{z} coordinates, and the synergistic use of these variants aids Mamba in comprehensively observing point cloud data. Furthermore, to assist Mamba in handling point sequences with different orders more effectively, we introduce point prompts to inform Mamba of the sequence's arrangement rules. Finally, we propose positional encoding based on spatial coordinate mapping to inject positional information into point cloud sequences more effectively. Point Cloud Mamba surpasses the state-of-the-art (SOTA) point-based method PointNeXt and achieves new SOTA performance on the ScanObjectNN, ModelNet40, ShapeNetPart, and S3DIS datasets. It is worth mentioning that when using a more powerful local feature extraction module, our PCM achieves 79.6 mIoU on S3DIS, significantly surpassing the previous SOTA models, DeLA and PTv3, by 5.5 mIoU and 4.9 mIoU, respectively.
Paper Structure (14 sections, 13 equations, 7 figures, 16 tables)

This paper contains 14 sections, 13 equations, 7 figures, 16 tables.

Figures (7)

  • Figure 1: Several pipelines of point cloud modeling. (a) denotes point-based methods with only local perception, including point-based methods, such as PointNet qi2017pointnet, PointNet++ qi2017pointnet++, PointMLP ma2022pointmlp, and PointNeXt qian2022pointnext. (b) is the transformer-based method with global perception but quadratic computational cost, including Point Transformer point_transformer and Point-MAE pang2022point-mae. (c) represents Mamba-based methods, which offer advantages of global modeling and linear computational complexity.
  • Figure 2: The architecture of our proposed Point Cloud Mamba. PCM encoder consists of four stages, each comprising a geometric affine module and several mamba layers. Point downsampling is performed between stages. The decoder only consists of point interpolation, feature concatenation, and MLP.
  • Figure 3: The consistent traverse serialization strategy. The 3-D point cloud data is voxelized and then serialized into a 1-D point sequence according to a predefined order. M represents the total number of points in the point cloud. With the permutation of x, y, and z coordinates, consistent traverse serialization has six variants.
  • Figure 4: The order prompts. Different colors represent different serialization orders. $N_p$ order prompts are mapped to the same channel size as the features and then concatenated to the beginning and end of the input point sequence.
  • Figure 5: The failure cases of PCM. Incorrect areas are highlighted by red rectangles.
  • ...and 2 more figures