Table of Contents
Fetching ...

Novel class discovery meets foundation models for 3D semantic segmentation

Luigi Riz, Cristiano Saltori, Yiming Wang, Elisa Ricci, Fabio Poiesi

TL;DR

This work addresses Novel Class Discovery for 3D point cloud semantic segmentation, a task challenging due to multiple novel classes per scene and 3D data sparsity. It introduces SNOPS, which combines online prototype-based pseudo-labelling, uncertainty-aware training, a class-balanced queue, and semantic distillation from a 3D foundation model to jointly learn base and novel classes. Compared to adapted 2D NCD baselines and zero-shot OpenScene prompts, SNOPS delivers substantial improvements across SemanticPOSS, SemanticKITTI, and S3DIS, supported by an explicit evaluation protocol. The approach demonstrates how semantically aligned feature spaces and online clustering can yield robust open-set performance in 3D semantic segmentation, with practical implications for scalable scene understanding. Future work includes relaxing the assumption of a known number of novel classes and addressing distillation-induced domain gaps.

Abstract

The task of Novel Class Discovery (NCD) in semantic segmentation entails training a model able to accurately segment unlabelled (novel) classes, relying on the available supervision from annotated (base) classes. Although extensively investigated in 2D image data, the extension of the NCD task to the domain of 3D point clouds represents a pioneering effort, characterized by assumptions and challenges that are not present in the 2D case. This paper represents an advancement in the analysis of point cloud data in four directions. Firstly, it introduces the novel task of NCD for point cloud semantic segmentation. Secondly, it demonstrates that directly transposing the only existing NCD method for 2D image semantic segmentation to 3D data yields suboptimal results. Thirdly, a new NCD approach based on online clustering, uncertainty estimation, and semantic distillation is presented. Lastly, a novel evaluation protocol is proposed to rigorously assess the performance of NCD in point cloud semantic segmentation. Through comprehensive evaluations on the SemanticKITTI, SemanticPOSS, and S3DIS datasets, the paper demonstrates substantial superiority of the proposed method over the considered baselines.

Novel class discovery meets foundation models for 3D semantic segmentation

TL;DR

This work addresses Novel Class Discovery for 3D point cloud semantic segmentation, a task challenging due to multiple novel classes per scene and 3D data sparsity. It introduces SNOPS, which combines online prototype-based pseudo-labelling, uncertainty-aware training, a class-balanced queue, and semantic distillation from a 3D foundation model to jointly learn base and novel classes. Compared to adapted 2D NCD baselines and zero-shot OpenScene prompts, SNOPS delivers substantial improvements across SemanticPOSS, SemanticKITTI, and S3DIS, supported by an explicit evaluation protocol. The approach demonstrates how semantically aligned feature spaces and online clustering can yield robust open-set performance in 3D semantic segmentation, with practical implications for scalable scene understanding. Future work includes relaxing the assumption of a known number of novel classes and addressing distillation-induced domain gaps.

Abstract

The task of Novel Class Discovery (NCD) in semantic segmentation entails training a model able to accurately segment unlabelled (novel) classes, relying on the available supervision from annotated (base) classes. Although extensively investigated in 2D image data, the extension of the NCD task to the domain of 3D point clouds represents a pioneering effort, characterized by assumptions and challenges that are not present in the 2D case. This paper represents an advancement in the analysis of point cloud data in four directions. Firstly, it introduces the novel task of NCD for point cloud semantic segmentation. Secondly, it demonstrates that directly transposing the only existing NCD method for 2D image semantic segmentation to 3D data yields suboptimal results. Thirdly, a new NCD approach based on online clustering, uncertainty estimation, and semantic distillation is presented. Lastly, a novel evaluation protocol is proposed to rigorously assess the performance of NCD in point cloud semantic segmentation. Through comprehensive evaluations on the SemanticKITTI, SemanticPOSS, and S3DIS datasets, the paper demonstrates substantial superiority of the proposed method over the considered baselines.
Paper Structure (18 sections, 7 equations, 10 figures, 11 tables)

This paper contains 18 sections, 7 equations, 10 figures, 11 tables.

Figures (10)

  • Figure 1: SNOPS addresses the novel class discovery task in 3D point cloud semantic segmentation by leveraging the knowledge of ground-truth labels (for base classes) and the auxiliary supervision from a foundation model (for novel classes) to learn the correct semantic segmentation of both base and novel points.
  • Figure 2: Overview of SNOPS. We extract point-level features $\mathcal{F}$ with the shared backbone $f_g$. $\mathcal{F}$ are used to obtain pseudo-labels in the online pseudo-labelling block. We forward $\mathcal{F}$ through a novel $f_n$ and a base $f_b$ segmentation head to obtain point-wise predictions. We also pass $\mathcal{F}$ through a projection layer $f_s$ that produces point-wise features for novel points. We align such point descriptors to the ones output by a frozen auxiliary network $f_a$, a large 3D vision model. The network is optimised by minimising the sum of a segmentation loss and an alignment loss.
  • Figure 3: Overview of the different outputs after the input point cloud $\mathcal{X}$ undergoes two different random augmentations, required for the generation of self-supervised pseudo-labels.
  • Figure 4: Evolution of the adaptive selection threshold $\tau_c$ when discovering four novel classes on S3DIS.
  • Figure 5: Overview of EUMS$^\dag$, our adaptation of the method proposed by zhao2022novel. We first pre-train $f_g$ and $f_b$ considering only the base points in each point cloud. Using $f_g$, we extract the features of the novel points in each scene, that are filtered with the selection function $\Psi(\cdot)$. Then, we produce the pseudo-labels for the selected novel points by using the k-means algorithm. Lastly, we plug a new segmentation head $f_c$ into $f_g$ and fine-tune the complete model on both novel and base points, considering pseudo-labels and ground-truth labels respectively.
  • ...and 5 more figures