Table of Contents
Fetching ...

Uncertainty Meets Diversity: A Comprehensive Active Learning Framework for Indoor 3D Object Detection

Jiangyi Wang, Na Zhao

TL;DR

The paper tackles the high annotation cost of indoor 3D object detection by designing an active learning framework that jointly optimizes uncertainty and diversity. It introduces a two-pronged epistemic-uncertainty estimator that handles both inaccurate detections and undetected objects, with a localization-aware score and an undetection count predictor, unified via normalized product scoring. For diversity, it proposes a Class-aware Adaptive Prototype (CAP) bank that dynamically allocates per-class prototypes to capture intra-class variance and scene-type distribution, and selects diverse samples by solving a prototype-histogram optimization using a partitioned, greedy approach. Evaluated on SUN RGB-D and ScanNetV2 with CAGroup3D, the method delivers substantial improvements over baselines and achieves over 85% of fully-supervised performance with only 10% of annotations. This work significantly reduces labeling effort for indoor 3D perception and provides a scalable, uncertainty- and diversity-driven approach adaptable to other indoor sensing tasks.

Abstract

Active learning has emerged as a promising approach to reduce the substantial annotation burden in 3D object detection tasks, spurring several initiatives in outdoor environments. However, its application in indoor environments remains unexplored. Compared to outdoor 3D datasets, indoor datasets face significant challenges, including fewer training samples per class, a greater number of classes, more severe class imbalance, and more diverse scene types and intra-class variances. This paper presents the first study on active learning for indoor 3D object detection, where we propose a novel framework tailored for this task. Our method incorporates two key criteria - uncertainty and diversity - to actively select the most ambiguous and informative unlabeled samples for annotation. The uncertainty criterion accounts for both inaccurate detections and undetected objects, ensuring that the most ambiguous samples are prioritized. Meanwhile, the diversity criterion is formulated as a joint optimization problem that maximizes the diversity of both object class distributions and scene types, using a new Class-aware Adaptive Prototype (CAP) bank. The CAP bank dynamically allocates representative prototypes to each class, helping to capture varying intra-class diversity across different categories. We evaluate our method on SUN RGB-D and ScanNetV2, where it outperforms baselines by a significant margin, achieving over 85% of fully-supervised performance with just 10% of the annotation budget.

Uncertainty Meets Diversity: A Comprehensive Active Learning Framework for Indoor 3D Object Detection

TL;DR

The paper tackles the high annotation cost of indoor 3D object detection by designing an active learning framework that jointly optimizes uncertainty and diversity. It introduces a two-pronged epistemic-uncertainty estimator that handles both inaccurate detections and undetected objects, with a localization-aware score and an undetection count predictor, unified via normalized product scoring. For diversity, it proposes a Class-aware Adaptive Prototype (CAP) bank that dynamically allocates per-class prototypes to capture intra-class variance and scene-type distribution, and selects diverse samples by solving a prototype-histogram optimization using a partitioned, greedy approach. Evaluated on SUN RGB-D and ScanNetV2 with CAGroup3D, the method delivers substantial improvements over baselines and achieves over 85% of fully-supervised performance with only 10% of annotations. This work significantly reduces labeling effort for indoor 3D perception and provides a scalable, uncertainty- and diversity-driven approach adaptable to other indoor sensing tasks.

Abstract

Active learning has emerged as a promising approach to reduce the substantial annotation burden in 3D object detection tasks, spurring several initiatives in outdoor environments. However, its application in indoor environments remains unexplored. Compared to outdoor 3D datasets, indoor datasets face significant challenges, including fewer training samples per class, a greater number of classes, more severe class imbalance, and more diverse scene types and intra-class variances. This paper presents the first study on active learning for indoor 3D object detection, where we propose a novel framework tailored for this task. Our method incorporates two key criteria - uncertainty and diversity - to actively select the most ambiguous and informative unlabeled samples for annotation. The uncertainty criterion accounts for both inaccurate detections and undetected objects, ensuring that the most ambiguous samples are prioritized. Meanwhile, the diversity criterion is formulated as a joint optimization problem that maximizes the diversity of both object class distributions and scene types, using a new Class-aware Adaptive Prototype (CAP) bank. The CAP bank dynamically allocates representative prototypes to each class, helping to capture varying intra-class diversity across different categories. We evaluate our method on SUN RGB-D and ScanNetV2, where it outperforms baselines by a significant margin, achieving over 85% of fully-supervised performance with just 10% of the annotation budget.

Paper Structure

This paper contains 11 sections, 10 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Challenges in active learning for indoor 3D object detection, including both uncertainty and diversity aspects. We show two indoor scenes on ScanNetV2, with red boxes for ground truths and blue for predictions (10% data using CAGroup3D detector), displaying only 'chair' and 'door' categories for clarity. For uncertainty, the presence of undetected objects and inaccurate detections undermines the quality of uncertainty estimation. For diversity, high scene-type diversity and varying intra-class variances in indoor environments complicate diverse sample selection. This work presents a hybrid approach to address these challenges.
  • Figure 2: Overview of our proposed AL framework for indoor 3D object detection, exploiting both uncertainty and diversity. In the $r$-th round, we first construct a candidate pool of size $\delta\cdot B_r$ using the Top-$K$ unified uncertainty score, which accounts for both inaccurate detections and undetections. Then, we jointly optimize the intra-class and scene-type diversity to select the $r$-th selected dataset $\mathcal{D}_r$. Lastly, along with the previous labeled point clouds $\mathcal{D}_{L}^{r-1}$, we retrain the model until the total labeled dataset reaches the annotation budget $B$.
  • Figure 3: mAP@0.25 (%) score of the proposed method and AL baselines on SUN RGB-D and ScanNetV2 benchmarks.
  • Figure 4: Qualitative comparison between random sampling and our proposed active learning method on ScanNetV2 val set.
  • Figure 5: Effects of hyper-parameters in our proposed method on SUN RGB-D dataset.