Table of Contents
Fetching ...

Open-CRB: Towards Open World Active Learning for 3D Object Detection

Zhuoxiao Chen, Yadan Luo, Zixin Wang, Zijian Wang, Xin Yu, Zi Huang

TL;DR

The Open-CRB framework is introduced, which seamlessly integrates OLC with the preliminary AL method, CRB, designed specifically for 3D object detection, and demonstrates superiority and flexibility in recognizing both novel and known classes with very limited labeling costs, compared to state-of-the-art baselines.

Abstract

LiDAR-based 3D object detection has recently seen significant advancements through active learning (AL), attaining satisfactory performance by training on a small fraction of strategically selected point clouds. However, in real-world deployments where streaming point clouds may include unknown or novel objects, the ability of current AL methods to capture such objects remains unexplored. This paper investigates a more practical and challenging research task: Open World Active Learning for 3D Object Detection (OWAL-3D), aimed at acquiring informative point clouds with new concepts. To tackle this challenge, we propose a simple yet effective strategy called Open Label Conciseness (OLC), which mines novel 3D objects with minimal annotation costs. Our empirical results show that OLC successfully adapts the 3D detection model to the open world scenario with just a single round of selection. Any generic AL policy can then be integrated with the proposed OLC to efficiently address the OWAL-3D problem. Based on this, we introduce the Open-CRB framework, which seamlessly integrates OLC with our preliminary AL method, CRB, designed specifically for 3D object detection. We develop a comprehensive codebase for easy reproducing and future research, supporting 15 baseline methods (\textit{i.e.}, active learning, out-of-distribution detection and open world detection), 2 types of modern 3D detectors (\textit{i.e.}, one-stage SECOND and two-stage PV-RCNN) and 3 benchmark 3D datasets (\textit{i.e.}, KITTI, nuScenes and Waymo). Extensive experiments evidence that the proposed Open-CRB demonstrates superiority and flexibility in recognizing both novel and known classes with very limited labeling costs, compared to state-of-the-art baselines. Source code is available at \url{https://github.com/Luoyadan/CRB-active-3Ddet/tree/Open-CRB}.

Open-CRB: Towards Open World Active Learning for 3D Object Detection

TL;DR

The Open-CRB framework is introduced, which seamlessly integrates OLC with the preliminary AL method, CRB, designed specifically for 3D object detection, and demonstrates superiority and flexibility in recognizing both novel and known classes with very limited labeling costs, compared to state-of-the-art baselines.

Abstract

LiDAR-based 3D object detection has recently seen significant advancements through active learning (AL), attaining satisfactory performance by training on a small fraction of strategically selected point clouds. However, in real-world deployments where streaming point clouds may include unknown or novel objects, the ability of current AL methods to capture such objects remains unexplored. This paper investigates a more practical and challenging research task: Open World Active Learning for 3D Object Detection (OWAL-3D), aimed at acquiring informative point clouds with new concepts. To tackle this challenge, we propose a simple yet effective strategy called Open Label Conciseness (OLC), which mines novel 3D objects with minimal annotation costs. Our empirical results show that OLC successfully adapts the 3D detection model to the open world scenario with just a single round of selection. Any generic AL policy can then be integrated with the proposed OLC to efficiently address the OWAL-3D problem. Based on this, we introduce the Open-CRB framework, which seamlessly integrates OLC with our preliminary AL method, CRB, designed specifically for 3D object detection. We develop a comprehensive codebase for easy reproducing and future research, supporting 15 baseline methods (\textit{i.e.}, active learning, out-of-distribution detection and open world detection), 2 types of modern 3D detectors (\textit{i.e.}, one-stage SECOND and two-stage PV-RCNN) and 3 benchmark 3D datasets (\textit{i.e.}, KITTI, nuScenes and Waymo). Extensive experiments evidence that the proposed Open-CRB demonstrates superiority and flexibility in recognizing both novel and known classes with very limited labeling costs, compared to state-of-the-art baselines. Source code is available at \url{https://github.com/Luoyadan/CRB-active-3Ddet/tree/Open-CRB}.
Paper Structure (18 sections, 9 equations, 8 figures, 2 tables, 1 algorithm)

This paper contains 18 sections, 9 equations, 8 figures, 2 tables, 1 algorithm.

Figures (8)

  • Figure 1: The illustration of the Open World Active Learning for 3D Object Detection (OWAL-3D) and conventional tasks. In traditional closed world 3D detection (a), pre-trained 3D detectors struggle to localize and recognize objects from new classes (i.e., out-of-distribution (OOD)) in an open world context. Generic active learning (b) focuses on known categories, failing to select point clouds that potentially contain OODs. To address this, we introduce OWAL-3D (c), a framework that selectively acquires and labels a small subset of point clouds which are more likely to contain novel concepts using an Open World Active Learning (AL) policy. This approach enables the 3D detection model to efficiently generalize to new scenes containing novel object categories while significantly reducing time and cost.
  • Figure 2: Upper: The overall framework of the proposed Open-CRB for OWAL-3D. Lower: The illustration of the proposed open world AL policy, Open Label Conciseness (OLC), which is designed for active selection from an unlabeled open world pool. The left bar plots report the annotation costs of the baseline methods and the proposed Open-CRB in the first selection round, along with the detection performance after training on the selected point clouds. The visualized point clouds in the middle and right illustrate the selection criteria (Eq. \ref{['equ:final_entropy']}), guided by two key relationships (Remark \ref{['rmk:3']}). The first relationship ensures a harmonic balance among the confidences associated with different predicted classes, promoting diversity and minimizing redundancy within the selected point clouds. The second relationship is inversely proportional, linking the number of bounding boxes to confidence levels. This relationship either 1) encourages exploration of unknown objects when low-confidence predictions are abundant, or 2) reduces the number of bounding boxes when the likelihood of unknown objects is low. These dual relationships work in tandem to select point clouds that include concise and high-quality known labels, and more unknown labels. The detailed algorithm is clearly summarized in Algorithm \ref{['alg:open-crb']}.
  • Figure 3: OWAL-3D performance (3D and BEV mAP$_{unk}$ scores) comparisons on unknown classes of Open-CRB and baselines on the KITTI dataset.
  • Figure 4: OWAL-3D performance (mAP$_{H}$: 3D and BEV harmonic mean of known mAP and unknown mAP) comparisons on all the classes of Open-CRB and baselines on the KITTI dataset.
  • Figure 5: Left (two scatter plots): OWAL-3D performance (mAP$_{unk}$ and mAP$_{H}$) comparisons of Open-CRB and baselines on the nuScenes dataset. Right (bar plot): The accumulation of the number of selected bounding boxes from nuScenes dataset, with active learning selection rounds increase, under the OWAL-3D setting.
  • ...and 3 more figures

Theorems & Definitions (4)

  • Definition 1: 3D Object Detection
  • Definition 2: CWAL-3D
  • Definition 3: OWAL-3D
  • Remark 1