Table of Contents
Fetching ...

CP-VoteNet: Contrastive Prototypical VoteNet for Few-Shot Point Cloud Object Detection

Xuejing Li, Weijia Zhang, Chao Ma

TL;DR

This work introduces contrastive semantics mining, which enables the network to extract discriminative categorical features by constructing positive and negative pairs within training batches, and proposes to impose contrastive relationship at the primitive level.

Abstract

Few-shot point cloud 3D object detection (FS3D) aims to identify and localise objects of novel classes from point clouds, using knowledge learnt from annotated base classes and novel classes with very few annotations. Thus far, this challenging task has been approached using prototype learning, but the performance remains far from satisfactory. We find that in existing methods, the prototypes are only loosely constrained and lack of fine-grained awareness of the semantic and geometrical correlation embedded within the point cloud space. To mitigate these issues, we propose to leverage the inherent contrastive relationship within the semantic and geometrical subspaces to learn more refined and generalisable prototypical representations. To this end, we first introduce contrastive semantics mining, which enables the network to extract discriminative categorical features by constructing positive and negative pairs within training batches. Meanwhile, since point features representing local patterns can be clustered into geometric components, we further propose to impose contrastive relationship at the primitive level. Through refined primitive geometric structures, the transferability of feature encoding from base to novel classes is significantly enhanced. The above designs and insights lead to our novel Contrastive Prototypical VoteNet (CP-VoteNet). Extensive experiments on two FS3D benchmarks FS-ScanNet and FS-SUNRGBD demonstrate that CP-VoteNet surpasses current state-of-the-art methods by considerable margins across different FS3D settings. Further ablation studies conducted corroborate the rationale and effectiveness of our designs.

CP-VoteNet: Contrastive Prototypical VoteNet for Few-Shot Point Cloud Object Detection

TL;DR

This work introduces contrastive semantics mining, which enables the network to extract discriminative categorical features by constructing positive and negative pairs within training batches, and proposes to impose contrastive relationship at the primitive level.

Abstract

Few-shot point cloud 3D object detection (FS3D) aims to identify and localise objects of novel classes from point clouds, using knowledge learnt from annotated base classes and novel classes with very few annotations. Thus far, this challenging task has been approached using prototype learning, but the performance remains far from satisfactory. We find that in existing methods, the prototypes are only loosely constrained and lack of fine-grained awareness of the semantic and geometrical correlation embedded within the point cloud space. To mitigate these issues, we propose to leverage the inherent contrastive relationship within the semantic and geometrical subspaces to learn more refined and generalisable prototypical representations. To this end, we first introduce contrastive semantics mining, which enables the network to extract discriminative categorical features by constructing positive and negative pairs within training batches. Meanwhile, since point features representing local patterns can be clustered into geometric components, we further propose to impose contrastive relationship at the primitive level. Through refined primitive geometric structures, the transferability of feature encoding from base to novel classes is significantly enhanced. The above designs and insights lead to our novel Contrastive Prototypical VoteNet (CP-VoteNet). Extensive experiments on two FS3D benchmarks FS-ScanNet and FS-SUNRGBD demonstrate that CP-VoteNet surpasses current state-of-the-art methods by considerable margins across different FS3D settings. Further ablation studies conducted corroborate the rationale and effectiveness of our designs.
Paper Structure (23 sections, 5 equations, 3 figures, 6 tables)

This paper contains 23 sections, 5 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Contrastive learning at the semantic level (left) and geometric level (right). Semantic contrastive learning requires that the instance features of positive pairs belonging to the same category be similar, and those of negative pairs belonging to different categories be dissimilar. Primitive contrastive learning demands that features of the same geometric components (i.e., faces within green circles) be similar, and those of different geometric components (i.e., edges and corners within red circles) be dissimilar.
  • Figure 2: The overall framework of the proposed CP-VoteNet. Positive and negative pairs are constructed within a minibatch for contrastive learning. Features $F_w$ assigned to different geometric prototypes are considered as different geometric components, which, after passing through a projection layer $proj_{\text{s}}(\cdot)$, engage in primitive contrastive learning. The instance features of the support instances, serving as semantic prototypes and passing through a projection layer $proj_{\text{p}}(\cdot)$, undergo semantic contrastive learning.
  • Figure 3: Visualisation of few-shot 3D object detection results on point clouds by our method and Prototypical VoteNet prototypicalFSL on novel split-1 of FS-ScanNet with $k$=5.