Table of Contents
Fetching ...

SA3DIP: Segment Any 3D Instance with Potential 3D Priors

Xi Yang, Xu Gu, Xingyilang Yin, Xinbo Gao

TL;DR

SA3DIP is proposed, a novel method for Segmenting Any 3D Instances via exploiting potential 3D Priors via exploiting potential 3D Priors that generates complementary 3D primitives based on both geometric and textural priors, which reduces the initial errors that accumulate in subsequent procedures.

Abstract

The proliferation of 2D foundation models has sparked research into adapting them for open-world 3D instance segmentation. Recent methods introduce a paradigm that leverages superpoints as geometric primitives and incorporates 2D multi-view masks from Segment Anything model (SAM) as merging guidance, achieving outstanding zero-shot instance segmentation results. However, the limited use of 3D priors restricts the segmentation performance. Previous methods calculate the 3D superpoints solely based on estimated normal from spatial coordinates, resulting in under-segmentation for instances with similar geometry. Besides, the heavy reliance on SAM and hand-crafted algorithms in 2D space suffers from over-segmentation due to SAM's inherent part-level segmentation tendency. To address these issues, we propose SA3DIP, a novel method for Segmenting Any 3D Instances via exploiting potential 3D Priors. Specifically, on one hand, we generate complementary 3D primitives based on both geometric and textural priors, which reduces the initial errors that accumulate in subsequent procedures. On the other hand, we introduce supplemental constraints from the 3D space by using a 3D detector to guide a further merging process. Furthermore, we notice a considerable portion of low-quality ground truth annotations in ScanNetV2 benchmark, which affect the fair evaluations. Thus, we present ScanNetV2-INS with complete ground truth labels and supplement additional instances for 3D class-agnostic instance segmentation. Experimental evaluations on various 2D-3D datasets demonstrate the effectiveness and robustness of our approach. Our code and proposed ScanNetV2-INS dataset are available HERE.

SA3DIP: Segment Any 3D Instance with Potential 3D Priors

TL;DR

SA3DIP is proposed, a novel method for Segmenting Any 3D Instances via exploiting potential 3D Priors via exploiting potential 3D Priors that generates complementary 3D primitives based on both geometric and textural priors, which reduces the initial errors that accumulate in subsequent procedures.

Abstract

The proliferation of 2D foundation models has sparked research into adapting them for open-world 3D instance segmentation. Recent methods introduce a paradigm that leverages superpoints as geometric primitives and incorporates 2D multi-view masks from Segment Anything model (SAM) as merging guidance, achieving outstanding zero-shot instance segmentation results. However, the limited use of 3D priors restricts the segmentation performance. Previous methods calculate the 3D superpoints solely based on estimated normal from spatial coordinates, resulting in under-segmentation for instances with similar geometry. Besides, the heavy reliance on SAM and hand-crafted algorithms in 2D space suffers from over-segmentation due to SAM's inherent part-level segmentation tendency. To address these issues, we propose SA3DIP, a novel method for Segmenting Any 3D Instances via exploiting potential 3D Priors. Specifically, on one hand, we generate complementary 3D primitives based on both geometric and textural priors, which reduces the initial errors that accumulate in subsequent procedures. On the other hand, we introduce supplemental constraints from the 3D space by using a 3D detector to guide a further merging process. Furthermore, we notice a considerable portion of low-quality ground truth annotations in ScanNetV2 benchmark, which affect the fair evaluations. Thus, we present ScanNetV2-INS with complete ground truth labels and supplement additional instances for 3D class-agnostic instance segmentation. Experimental evaluations on various 2D-3D datasets demonstrate the effectiveness and robustness of our approach. Our code and proposed ScanNetV2-INS dataset are available HERE.

Paper Structure

This paper contains 25 sections, 5 equations, 4 figures, 5 tables, 1 algorithm.

Figures (4)

  • Figure 1: Comparison of our SA3DIP with other methods. Methods like SAI3D (bottom) fail to distinguish instances with similar normals when computing superpoints, which accumulate to the final segmentation. Moreover, the part-level 2D segmentation transfers to 3D space, resulting in over-segmented 3D instances. We present a novel pipeline for segmenting any 3D instances, which overcomes the limitations by exploiting additional 3D priors, specifically by incorporating both geometric and textural prior on superpoints computing, and supplementing 3D space constraint provided 3D prior by utilizing a 3D detector.
  • Figure 2: Overall pipeline. Our approach first integrates both geometric and textural priors for grouping 3D primitives (step A). Corresponding posed masks are generated using SAM. An affinity matrix is then computed based on these 2D-3D results serving as edge weights (step B). Region growing and instance-aware refinement are conducted on the constructed scene graph, utilizing 3D box constraint to address over-segmentation while maintaining the fine-grained outcomes (step C).
  • Figure 3: Overview of our proposed ScanNetV2-INS. We present the new benchmark for 3D class-agnostic instance segmentation, which rectifies incomplete annotations and incorporates more instances based on ScanNetV2. Row (a) shows the comparison before and after revision, and row (b) illustrates the object counts per scene between the two benchmarks.
  • Figure 4: Visual comparison between our method with SAM3D yang2023sam3d, SAMPro3D xu2023sampro3d, and SAI3D yin2023sai3d on ScanNetV2, ScanNetV2-INS, and ScanNet++ dataset. Among all datasets, our method shows the most robust and accurate segmentation.