Table of Contents
Fetching ...

Dual-Perspective Knowledge Enrichment for Semi-Supervised 3D Object Detection

Yucheng Han, Na Zhao, Weiling Chen, Keng Teck Ma, Hanwang Zhang

TL;DR

DPKE tackles semi-supervised 3D object detection in cluttered indoor scenes by introducing dual-perspective knowledge enrichment: data-perspective augmentation via class-probabilistic sampling and feature-perspective regularization through geometry-aware proposal matching. Built on a Mean-Teacher framework with a VoteNet backbone, DPKE pastes probabilistically sampled labeled proposals into scenes and enforces geometry-guided consistency between student and teacher proposal features, addressing both data diversity and pseudo-label quality. The approach achieves state-of-the-art results on ScanNet and SUN RGB-D across multiple label ratios, outperforming SESS, 3DIoUMatch, and semi-sampling baselines, and it remains robust under varying augmentation and threshold settings. The work reduces annotation costs for indoor 3D perception and provides practical insights for exploiting unlabeled 3D data with dual-level supervision, with code to be released publicly.

Abstract

Semi-supervised 3D object detection is a promising yet under-explored direction to reduce data annotation costs, especially for cluttered indoor scenes. A few prior works, such as SESS and 3DIoUMatch, attempt to solve this task by utilizing a teacher model to generate pseudo-labels for unlabeled samples. However, the availability of unlabeled samples in the 3D domain is relatively limited compared to its 2D counterpart due to the greater effort required to collect 3D data. Moreover, the loose consistency regularization in SESS and restricted pseudo-label selection strategy in 3DIoUMatch lead to either low-quality supervision or a limited amount of pseudo labels. To address these issues, we present a novel Dual-Perspective Knowledge Enrichment approach named DPKE for semi-supervised 3D object detection. Our DPKE enriches the knowledge of limited training data, particularly unlabeled data, from two perspectives: data-perspective and feature-perspective. Specifically, from the data-perspective, we propose a class-probabilistic data augmentation method that augments the input data with additional instances based on the varying distribution of class probabilities. Our DPKE achieves feature-perspective knowledge enrichment by designing a geometry-aware feature matching method that regularizes feature-level similarity between object proposals from the student and teacher models. Extensive experiments on the two benchmark datasets demonstrate that our DPKE achieves superior performance over existing state-of-the-art approaches under various label ratio conditions. The source code will be made available to the public.

Dual-Perspective Knowledge Enrichment for Semi-Supervised 3D Object Detection

TL;DR

DPKE tackles semi-supervised 3D object detection in cluttered indoor scenes by introducing dual-perspective knowledge enrichment: data-perspective augmentation via class-probabilistic sampling and feature-perspective regularization through geometry-aware proposal matching. Built on a Mean-Teacher framework with a VoteNet backbone, DPKE pastes probabilistically sampled labeled proposals into scenes and enforces geometry-guided consistency between student and teacher proposal features, addressing both data diversity and pseudo-label quality. The approach achieves state-of-the-art results on ScanNet and SUN RGB-D across multiple label ratios, outperforming SESS, 3DIoUMatch, and semi-sampling baselines, and it remains robust under varying augmentation and threshold settings. The work reduces annotation costs for indoor 3D perception and provides practical insights for exploiting unlabeled 3D data with dual-level supervision, with code to be released publicly.

Abstract

Semi-supervised 3D object detection is a promising yet under-explored direction to reduce data annotation costs, especially for cluttered indoor scenes. A few prior works, such as SESS and 3DIoUMatch, attempt to solve this task by utilizing a teacher model to generate pseudo-labels for unlabeled samples. However, the availability of unlabeled samples in the 3D domain is relatively limited compared to its 2D counterpart due to the greater effort required to collect 3D data. Moreover, the loose consistency regularization in SESS and restricted pseudo-label selection strategy in 3DIoUMatch lead to either low-quality supervision or a limited amount of pseudo labels. To address these issues, we present a novel Dual-Perspective Knowledge Enrichment approach named DPKE for semi-supervised 3D object detection. Our DPKE enriches the knowledge of limited training data, particularly unlabeled data, from two perspectives: data-perspective and feature-perspective. Specifically, from the data-perspective, we propose a class-probabilistic data augmentation method that augments the input data with additional instances based on the varying distribution of class probabilities. Our DPKE achieves feature-perspective knowledge enrichment by designing a geometry-aware feature matching method that regularizes feature-level similarity between object proposals from the student and teacher models. Extensive experiments on the two benchmark datasets demonstrate that our DPKE achieves superior performance over existing state-of-the-art approaches under various label ratio conditions. The source code will be made available to the public.
Paper Structure (34 sections, 4 equations, 7 figures, 6 tables)

This paper contains 34 sections, 4 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: The dataset statistics. The orange bars represent the number of samples in the corresponding dataset, while the blue bars represent the number of objects per sample. The 3D indoor dataset (ScanNet) contains much more objects per scene than the 3D outdoor dataset (KITTI) or the 2D dataset (Pascal).
  • Figure 2: The overall framework of our proposed DPKE. Before the original scene-level augmentation operator, we introduce a new data augmentation module based on the normalized logits of the student model to increase the diversity of the input data. The student model is updated using the loss function combined by 3DIoUMatch Loss wang20213dioumatch and our geometry-aware feature matching loss, which exploits knowledge from teacher predictions with lower confidence.
  • Figure 3: The distribution of samples providing different supervision from the teacher model predictions. The invalid supervision represents the samples failing to match ground truth. The rest part could be divided into samples providing strong (e.g. 3DIoUMatch) and weak supervision (e.g. SESS or our Geometry-aware Feature Matching).
  • Figure 4: Per class average precision (AP) comparison by applying different data augmentation strategies on 3DIoUMatch. The experiment is conducted on the ScanNet with the label ratio as 5%.
  • Figure 5: Visualization results on ScanNet and SUN-RGBD. The models are trained on ScanNet 5% and SUN RGB-D 10%, respectively. The areas within the green circles contain the ground truth and correct predictions, while those in the red circles contains wrong predictions.
  • ...and 2 more figures