Table of Contents
Fetching ...

Serialized Point Mamba: A Serialized Point Cloud Mamba Segmentation Model

Tao Wang, Wei Wen, Jingzhi Zhai, Kang Xu, Haoming Luo

TL;DR

This work tackles 3D point cloud semantic segmentation for robotic perception by introducing Serialized Point Mamba, a data-dependent State-Space Model backbone that processes unordered point clouds through space-filling-curve serialization. It combines staged local sequence learning, grid pooling, and Enhanced Conditional Positional Encoding to achieve linear complexity with respect to sequence length, while delivering strong accuracy on ScanNet, S3DIS, and nuScenes. Key contributions include a multi-serializer approach with bidirectional Mamba, selective SSM via data-dependent parameters, and a U-Net–style architecture that preserves spatial structure. Empirical results show $76.8$ mIoU on ScanNet and $70.3$ mIoU on S3DIS for semantic segmentation, $40.0$ mAP on ScanNetv2 instance segmentation, and the lowest latency among comparators, highlighting practical benefits for real-time and large-scale point-cloud understanding. This work demonstrates the viability of applying Mamba-based sequence modeling to 3D vision, offering a scalable alternative to Transformer-based architectures for both indoor and outdoor datasets.

Abstract

Point cloud segmentation is crucial for robotic visual perception and environmental understanding, enabling applications such as robotic navigation and 3D reconstruction. However, handling the sparse and unordered nature of point cloud data presents challenges for efficient and accurate segmentation. Inspired by the Mamba model's success in natural language processing, we propose the Serialized Point Cloud Mamba Segmentation Model (Serialized Point Mamba), which leverages a state-space model to dynamically compress sequences, reduce memory usage, and enhance computational efficiency. Serialized Point Mamba integrates local-global modeling capabilities with linear complexity, achieving state-of-the-art performance on both indoor and outdoor datasets. This approach includes novel techniques such as staged point cloud sequence learning, grid pooling, and Conditional Positional Encoding, facilitating effective segmentation across diverse point cloud tasks. Our method achieved 76.8 mIoU on Scannet and 70.3 mIoU on S3DIS. In Scannetv2 instance segmentation, it recorded 40.0 mAP. It also had the lowest latency and reasonable memory use, making it the SOTA among point semantic segmentation models based on mamba.

Serialized Point Mamba: A Serialized Point Cloud Mamba Segmentation Model

TL;DR

This work tackles 3D point cloud semantic segmentation for robotic perception by introducing Serialized Point Mamba, a data-dependent State-Space Model backbone that processes unordered point clouds through space-filling-curve serialization. It combines staged local sequence learning, grid pooling, and Enhanced Conditional Positional Encoding to achieve linear complexity with respect to sequence length, while delivering strong accuracy on ScanNet, S3DIS, and nuScenes. Key contributions include a multi-serializer approach with bidirectional Mamba, selective SSM via data-dependent parameters, and a U-Net–style architecture that preserves spatial structure. Empirical results show mIoU on ScanNet and mIoU on S3DIS for semantic segmentation, mAP on ScanNetv2 instance segmentation, and the lowest latency among comparators, highlighting practical benefits for real-time and large-scale point-cloud understanding. This work demonstrates the viability of applying Mamba-based sequence modeling to 3D vision, offering a scalable alternative to Transformer-based architectures for both indoor and outdoor datasets.

Abstract

Point cloud segmentation is crucial for robotic visual perception and environmental understanding, enabling applications such as robotic navigation and 3D reconstruction. However, handling the sparse and unordered nature of point cloud data presents challenges for efficient and accurate segmentation. Inspired by the Mamba model's success in natural language processing, we propose the Serialized Point Cloud Mamba Segmentation Model (Serialized Point Mamba), which leverages a state-space model to dynamically compress sequences, reduce memory usage, and enhance computational efficiency. Serialized Point Mamba integrates local-global modeling capabilities with linear complexity, achieving state-of-the-art performance on both indoor and outdoor datasets. This approach includes novel techniques such as staged point cloud sequence learning, grid pooling, and Conditional Positional Encoding, facilitating effective segmentation across diverse point cloud tasks. Our method achieved 76.8 mIoU on Scannet and 70.3 mIoU on S3DIS. In Scannetv2 instance segmentation, it recorded 40.0 mAP. It also had the lowest latency and reasonable memory use, making it the SOTA among point semantic segmentation models based on mamba.
Paper Structure (28 sections, 5 equations, 3 figures, 11 tables)

This paper contains 28 sections, 5 equations, 3 figures, 11 tables.

Figures (3)

  • Figure 1: The architecture of Serialized Point Mamba encoder
  • Figure 2: Bidirectional Mamba.
  • Figure 3: Multi-serialization methods utilization.