LaSSM: Efficient Semantic-Spatial Query Decoding via Local Aggregation and State Space Models for 3D Instance Segmentation
Lei Yao, Yi Wang, Yawen Cui, Moyun Liu, Lap-Pui Chau
TL;DR
LaSSM addresses two core bottlenecks in query-based 3D instance segmentation from point clouds: how to initialize a high-quality set of queries and how to refine them efficiently. It introduces a hierarchical semantic-spatial query initializer that derives query contents and coordinates from superpoints by jointly considering semantic cues and spatial distribution, and a coordinate-guided state space model (SSM) decoder with a local aggregation module and a spatial dual-path SSM to refine queries with positional awareness. Through extensive ablations, the authors demonstrate that the initializer improves coverage and convergence, while the decoder provides efficient, accurate refinement with reduced computational cost. The method yields state-of-the-art results on ScanNet++ V2 with only about one-third of the FLOPs and shows competitive performance on multiple indoor benchmarks, highlighting its practical impact for scalable large-scale 3D scene understanding.
Abstract
Query-based 3D scene instance segmentation from point clouds has attained notable performance. However, existing methods suffer from the query initialization dilemma due to the sparse nature of point clouds and rely on computationally intensive attention mechanisms in query decoders. We accordingly introduce LaSSM, prioritizing simplicity and efficiency while maintaining competitive performance. Specifically, we propose a hierarchical semantic-spatial query initializer to derive the query set from superpoints by considering both semantic cues and spatial distribution, achieving comprehensive scene coverage and accelerated convergence. We further present a coordinate-guided state space model (SSM) decoder that progressively refines queries. The novel decoder features a local aggregation scheme that restricts the model to focus on geometrically coherent regions and a spatial dual-path SSM block to capture underlying dependencies within the query set by integrating associated coordinates information. Our design enables efficient instance prediction, avoiding the incorporation of noisy information and reducing redundant computation. LaSSM ranks first place on the latest ScanNet++ V2 leaderboard, outperforming the previous best method by 2.5% mAP with only 1/3 FLOPs, demonstrating its superiority in challenging large-scale scene instance segmentation. LaSSM also achieves competitive performance on ScanNet, ScanNet200, S3DIS and ScanNet++ V1 benchmarks with less computational cost. Extensive ablation studies and qualitative results validate the effectiveness of our design. The code and weights are available at https://github.com/RayYoh/LaSSM.
