SGS-3D: High-Fidelity 3D Instance Segmentation via Reliable Semantic Mask Splitting and Growing

Chaolei Wang; Yang Luo; Jing Du; Siyu Chen; Yiping Chen; Ting Han

SGS-3D: High-Fidelity 3D Instance Segmentation via Reliable Semantic Mask Splitting and Growing

Chaolei Wang, Yang Luo, Jing Du, Siyu Chen, Yiping Chen, Ting Han

TL;DR

SGS-3D tackles the pervasive errors in 3D instance segmentation arising from 2D-to-3D lifting by introducing a training-free split-then-grow refinement that fuses semantic and geometric cues. The method employs an occlusion-aware point-image mapping, co-occurrence-driven 2D mask filtering, and a semantic-guided aggregation pipeline with density-based splitting, feature-guided growing, and multi-view merging. It achieves state-of-the-art performance among training-free approaches on ScanNet200, ScanNet++, and KITTI-360, with notable robustness in depth-less outdoor environments, and enables open-set 3D understanding when combined with vision-language models. The approach provides a practical, generalizable bridge between 2D semantic foundations and 3D geometry for high-fidelity, class-agnostic 3D instance segmentation.

Abstract

Accurate 3D instance segmentation is crucial for high-quality scene understanding in the 3D vision domain. However, 3D instance segmentation based on 2D-to-3D lifting approaches struggle to produce precise instance-level segmentation, due to accumulated errors introduced during the lifting process from ambiguous semantic guidance and insufficient depth constraints. To tackle these challenges, we propose splitting and growing reliable semantic mask for high-fidelity 3D instance segmentation (SGS-3D), a novel "split-then-grow" framework that first purifies and splits ambiguous lifted masks using geometric primitives, and then grows them into complete instances within the scene. Unlike existing approaches that directly rely on raw lifted masks and sacrifice segmentation accuracy, SGS-3D serves as a training-free refinement method that jointly fuses semantic and geometric information, enabling effective cooperation between the two levels of representation. Specifically, for semantic guidance, we introduce a mask filtering strategy that leverages the co-occurrence of 3D geometry primitives to identify and remove ambiguous masks, thereby ensuring more reliable semantic consistency with the 3D object instances. For the geometric refinement, we construct fine-grained object instances by exploiting both spatial continuity and high-level features, particularly in the case of semantic ambiguity between distinct objects. Experimental results on ScanNet200, ScanNet++, and KITTI-360 demonstrate that SGS-3D substantially improves segmentation accuracy and robustness against inaccurate masks from pre-trained models, yielding high-fidelity object instances while maintaining strong generalization across diverse indoor and outdoor environments. Code is available at https://github.com/wangchaolei7/SGS-3D.

SGS-3D: High-Fidelity 3D Instance Segmentation via Reliable Semantic Mask Splitting and Growing

TL;DR

Abstract

SGS-3D: High-Fidelity 3D Instance Segmentation via Reliable Semantic Mask Splitting and Growing

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)