Table of Contents
Fetching ...

Pamba: Enhancing Global Interaction in Point Clouds via State Space Model

Zhuoyuan Li, Yubo Ai, Jiahao Lu, ChuXin Wang, Jiacheng Deng, Hanzhi Chang, Yanzhe Liang, Wenfei Yang, Shifeng Zhang, Tianzhu Zhang

TL;DR

Pamba addresses the scalability bottlenecks of transformer-based 3D point cloud segmentation by adopting a state-space model (Mamba) backbone with linear complexity. It introduces a multi-path hz serialization strategy and the ConvMamba block to fuse global long-range dependencies with local geometry, enabling processing of whole scenes without patching. The approach achieves state-of-the-art results on ScanNet v2, ScanNet200, S3DIS, and nuScenes, while maintaining competitive memory and latency characteristics. This work demonstrates the practical potential of bidirectional, globally-aware SSMs for realistic, large-scale 3D scene understanding and offers concrete design choices for integrating global and local cues in point clouds.

Abstract

Transformers have demonstrated impressive results for 3D point cloud semantic segmentation. However, the quadratic complexity of transformer makes computation costs high, limiting the number of points that can be processed simultaneously and impeding the modeling of long-range dependencies between objects in a single scene. Drawing inspiration from the great potential of recent state space models (SSM) for long sequence modeling, we introduce Mamba, an SSM-based architecture, to the point cloud domain and propose Pamba, a novel architecture with strong global modeling capability under linear complexity. Specifically, to make the disorderness of point clouds fit in with the causal nature of Mamba, we propose a multi-path serialization strategy applicable to point clouds. Besides, we propose the ConvMamba block to compensate for the shortcomings of Mamba in modeling local geometries and in unidirectional modeling. Pamba obtains state-of-the-art results on several 3D point cloud segmentation tasks, including ScanNet v2, ScanNet200, S3DIS and nuScenes, while its effectiveness is validated by extensive experiments.

Pamba: Enhancing Global Interaction in Point Clouds via State Space Model

TL;DR

Pamba addresses the scalability bottlenecks of transformer-based 3D point cloud segmentation by adopting a state-space model (Mamba) backbone with linear complexity. It introduces a multi-path hz serialization strategy and the ConvMamba block to fuse global long-range dependencies with local geometry, enabling processing of whole scenes without patching. The approach achieves state-of-the-art results on ScanNet v2, ScanNet200, S3DIS, and nuScenes, while maintaining competitive memory and latency characteristics. This work demonstrates the practical potential of bidirectional, globally-aware SSMs for realistic, large-scale 3D scene understanding and offers concrete design choices for integrating global and local cues in point clouds.

Abstract

Transformers have demonstrated impressive results for 3D point cloud semantic segmentation. However, the quadratic complexity of transformer makes computation costs high, limiting the number of points that can be processed simultaneously and impeding the modeling of long-range dependencies between objects in a single scene. Drawing inspiration from the great potential of recent state space models (SSM) for long sequence modeling, we introduce Mamba, an SSM-based architecture, to the point cloud domain and propose Pamba, a novel architecture with strong global modeling capability under linear complexity. Specifically, to make the disorderness of point clouds fit in with the causal nature of Mamba, we propose a multi-path serialization strategy applicable to point clouds. Besides, we propose the ConvMamba block to compensate for the shortcomings of Mamba in modeling local geometries and in unidirectional modeling. Pamba obtains state-of-the-art results on several 3D point cloud segmentation tasks, including ScanNet v2, ScanNet200, S3DIS and nuScenes, while its effectiveness is validated by extensive experiments.

Paper Structure

This paper contains 27 sections, 4 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Visualization of effective receptive fields (ERF) of the point of interest on ScanNet200 dataset. The yellow star represents the position of the point of interest. Pamba shows larger ERF and the ability of handling long-range interactions between different objects in a scene. More illustrations are provided in Appendix.
  • Figure 2: Left: The overall architecture of Pamba; Right: ConvMamba block.
  • Figure 3: Space-filling curves.
  • Figure 4: Our introduced hz curve and hz-swap curve.
  • Figure 5: Bidirectional Mamba. (a) Original Mamba structure; (b) Proposed bidirectional Mamba; (c) Global aggregation.