Table of Contents
Fetching ...

ActiveAnno3D -- An Active Learning Framework for Multi-Modal 3D Object Detection

Ahmed Ghita, Bjørk Antoniussen, Walter Zimmer, Ross Greer, Christian Creß, Andreas Møgelmose, Mohan M. Trivedi, Alois C. Knoll

TL;DR

ActiveAnno3D introduces a multimodal active-learning framework for 3D object detection that targets labeling efficiency without sacrificing accuracy. It combines a CRB-based acquisition strategy and continuous training to mitigate computational costs, evaluating LiDAR-only PV-RCNN and LiDAR+Camera BEVFusion on nuScenes and the TUM Traffic Intersection dataset. Key findings show substantial data-efficiency, e.g., 77.25 mAP at 50% data for TUM Traf-I and 64.31 mAP at 50% data for BEVFusion, with full data achieving higher baselines, illustrating dataset-dependent gains and the value of entropy-based sampling in certain contexts. The work integrates into proAnno for AI-assisted labeling and outlines future directions including Bayesian enhancements to the CRB, guided sampling, and extension to additional baselines, underscoring practical impact for safer, cost-effective autonomous-driving perception.

Abstract

The curation of large-scale datasets is still costly and requires much time and resources. Data is often manually labeled, and the challenge of creating high-quality datasets remains. In this work, we fill the research gap using active learning for multi-modal 3D object detection. We propose ActiveAnno3D, an active learning framework to select data samples for labeling that are of maximum informativeness for training. We explore various continuous training methods and integrate the most efficient method regarding computational demand and detection performance. Furthermore, we perform extensive experiments and ablation studies with BEVFusion and PV-RCNN on the nuScenes and TUM Traffic Intersection dataset. We show that we can achieve almost the same performance with PV-RCNN and the entropy-based query strategy when using only half of the training data (77.25 mAP compared to 83.50 mAP) of the TUM Traffic Intersection dataset. BEVFusion achieved an mAP of 64.31 when using half of the training data and 75.0 mAP when using the complete nuScenes dataset. We integrate our active learning framework into the proAnno labeling tool to enable AI-assisted data selection and labeling and minimize the labeling costs. Finally, we provide code, weights, and visualization results on our website: https://active3d-framework.github.io/active3d-framework.

ActiveAnno3D -- An Active Learning Framework for Multi-Modal 3D Object Detection

TL;DR

ActiveAnno3D introduces a multimodal active-learning framework for 3D object detection that targets labeling efficiency without sacrificing accuracy. It combines a CRB-based acquisition strategy and continuous training to mitigate computational costs, evaluating LiDAR-only PV-RCNN and LiDAR+Camera BEVFusion on nuScenes and the TUM Traffic Intersection dataset. Key findings show substantial data-efficiency, e.g., 77.25 mAP at 50% data for TUM Traf-I and 64.31 mAP at 50% data for BEVFusion, with full data achieving higher baselines, illustrating dataset-dependent gains and the value of entropy-based sampling in certain contexts. The work integrates into proAnno for AI-assisted labeling and outlines future directions including Bayesian enhancements to the CRB, guided sampling, and extension to additional baselines, underscoring practical impact for safer, cost-effective autonomous-driving perception.

Abstract

The curation of large-scale datasets is still costly and requires much time and resources. Data is often manually labeled, and the challenge of creating high-quality datasets remains. In this work, we fill the research gap using active learning for multi-modal 3D object detection. We propose ActiveAnno3D, an active learning framework to select data samples for labeling that are of maximum informativeness for training. We explore various continuous training methods and integrate the most efficient method regarding computational demand and detection performance. Furthermore, we perform extensive experiments and ablation studies with BEVFusion and PV-RCNN on the nuScenes and TUM Traffic Intersection dataset. We show that we can achieve almost the same performance with PV-RCNN and the entropy-based query strategy when using only half of the training data (77.25 mAP compared to 83.50 mAP) of the TUM Traffic Intersection dataset. BEVFusion achieved an mAP of 64.31 when using half of the training data and 75.0 mAP when using the complete nuScenes dataset. We integrate our active learning framework into the proAnno labeling tool to enable AI-assisted data selection and labeling and minimize the labeling costs. Finally, we provide code, weights, and visualization results on our website: https://active3d-framework.github.io/active3d-framework.
Paper Structure (17 sections, 1 equation, 9 figures, 2 tables)

This paper contains 17 sections, 1 equation, 9 figures, 2 tables.

Figures (9)

  • Figure 1: We propose a framework for efficient active learning within various 3D object detection techniques and modalities, demonstrating the effectiveness of active learning at reaching comparable detection performance on benchmark datasets at a fraction of the annotation cost. Datasets include roadside infrastructure sensors (top row) and onboard vehicle sensors (bottom row), with LiDAR-only and LiDAR+camera fusion methods, the two dominant strategies in state-of-the-art performance at the safety-critical detection task.
  • Figure 2: The generalized active learning flow involves the selection of data from an unlabeled pool according to an acquisition function, which, in the case of uncertainty-driven AL, utilizes the trained model or, in the case of diversity-driven AL, may be independent of the training. This selected data is then annotated by an oracle and aggregated with previously labeled data. Whether or not all data or just the new data is used in the next training step is determined by the choice of training strategy. The variety of possible acquisition and training techniques and unique domain challenges posed by autonomous driving make active learning an opportune environment for innovation toward safe and accurate learning.
  • Figure 3: The graph on left illustrates the mAP scores achieved by the PV-RCNN model on the TUM Traffic Intersection dataset relative to the expanding size of the training set in the active learning setting with random and entropy queries separately. Similarly, the graph on right illustrates the mAP score achieved by the BEVFusion model on the nuScenes dataset relative to the expanding size of the training set.
  • Figure 4: The graph illustrates the mAP scores achieved by the PV-RCNN model on the TUM Traffic Intersection dataset relative to the expanding size of the training set in the active learning setting with different query strategies.
  • Figure : CRB
  • ...and 4 more figures