Table of Contents
Fetching ...

ClusteringSDF: Self-Organized Neural Implicit Surfaces for 3D Decomposition

Tianhao Wu, Chuanxia Zheng, Tat-Jen Cham, Qianyi Wu

TL;DR

ClusteringSDF is proposed, a novel approach to achieve both segmentation and reconstruction in 3D via the neural implicit surface representation, specifically Signal Distance Function (SDF), where the segmentation rendering is directly integrated with the volume rendering of neural implicit surfaces.

Abstract

3D decomposition/segmentation still remains a challenge as large-scale 3D annotated data is not readily available. Contemporary approaches typically leverage 2D machine-generated segments, integrating them for 3D consistency. While the majority of these methods are based on NeRFs, they face a potential weakness that the instance/semantic embedding features derive from independent MLPs, thus preventing the segmentation network from learning the geometric details of the objects directly through radiance and density. In this paper, we propose ClusteringSDF, a novel approach to achieve both segmentation and reconstruction in 3D via the neural implicit surface representation, specifically Signal Distance Function (SDF), where the segmentation rendering is directly integrated with the volume rendering of neural implicit surfaces. Although based on ObjectSDF++, ClusteringSDF no longer requires the ground-truth segments for supervision while maintaining the capability of reconstructing individual object surfaces, but purely with the noisy and inconsistent labels from pre-trained models.As the core of ClusteringSDF, we introduce a high-efficient clustering mechanism for lifting the 2D labels to 3D and the experimental results on the challenging scenes from ScanNet and Replica datasets show that ClusteringSDF can achieve competitive performance compared against the state-of-the-art with significantly reduced training time.

ClusteringSDF: Self-Organized Neural Implicit Surfaces for 3D Decomposition

TL;DR

ClusteringSDF is proposed, a novel approach to achieve both segmentation and reconstruction in 3D via the neural implicit surface representation, specifically Signal Distance Function (SDF), where the segmentation rendering is directly integrated with the volume rendering of neural implicit surfaces.

Abstract

3D decomposition/segmentation still remains a challenge as large-scale 3D annotated data is not readily available. Contemporary approaches typically leverage 2D machine-generated segments, integrating them for 3D consistency. While the majority of these methods are based on NeRFs, they face a potential weakness that the instance/semantic embedding features derive from independent MLPs, thus preventing the segmentation network from learning the geometric details of the objects directly through radiance and density. In this paper, we propose ClusteringSDF, a novel approach to achieve both segmentation and reconstruction in 3D via the neural implicit surface representation, specifically Signal Distance Function (SDF), where the segmentation rendering is directly integrated with the volume rendering of neural implicit surfaces. Although based on ObjectSDF++, ClusteringSDF no longer requires the ground-truth segments for supervision while maintaining the capability of reconstructing individual object surfaces, but purely with the noisy and inconsistent labels from pre-trained models.As the core of ClusteringSDF, we introduce a high-efficient clustering mechanism for lifting the 2D labels to 3D and the experimental results on the challenging scenes from ScanNet and Replica datasets show that ClusteringSDF can achieve competitive performance compared against the state-of-the-art with significantly reduced training time.
Paper Structure (43 sections, 15 equations, 11 figures, 3 tables)

This paper contains 43 sections, 15 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Taking RGB images from a scene and the corresponding 2D labels from a pre-trained segmentation model, our ClusteringSDF is able to reconstruct the surface of the scene while fusing these inconsistent segments to be coherent and more accurate in 3D. Furthermore, it learns an object-compositional neural implicit representation and can reconstruct the surfaces of individual objects purely from these noisy labels.
  • Figure 2: Overview of ClusteringSDF. It is designed to fuse inconsistent 2D segments to 3D while reconstructing the object geometry by predicting their SDFs. To achieve this, we sample rays from single 2D segment maps and split them into $N$ groups corresponding to the $N$ distinct 2D labels $\{C_1,...,C_N\}$. The proposed $\mathcal{L}_{\text{diff}}$ then leverages normalized SDF distributions encompassing $c$ channels for individual objects (presented by different colors) and keeps the clustering centers to be apart, with $\mathcal{L}_{\text{onehot}}$ further encouraging the predicted clusters to be in the one-hot format.
  • Figure 3: Examples of semantic and instance segmentation on novel view. Our ClusteringSDF achieves very competitive results against previous 3D segmentation methods. Note that the instance and semantic results of our model are slightly different as we train two individual networks respectively. Limited by the space, multi-view consistent segmentation results are provided in the supplementary materials. (Zoom in to see the details.)
  • Figure 4: Examples of our segmentation results in 3D. Detail comparisons to existing methods on 3D consistency are provided in supplementary materials.
  • Figure 5: Qualitative results for ablation study.
  • ...and 6 more figures