Table of Contents
Fetching ...

Label-Efficient 3D Object Detection For Road-Side Units

Minh-Quan Dao, Holger Caesar, Julie Stephany Berrio, Mao Shan, Stewart Worrall, Vincent Frémont, Ezio Malis

TL;DR

The paper tackles occlusion challenges in RSU-assisted autonomous perception by proposing a label-efficient autolabeling framework. It combines unsupervised object discovery using spatial-temporal, multi-frame, and multi-scale aggregation with a refinement module, followed by a self-training loop and a fine-tuning stage on a small labeled subset. Experiments on synthetic V2X-Sim and real A9 datasets show that with as few as 100 labeled point clouds for fine-tuning, the method approaches or matches fully supervised performance, with significant gains achieved during the initial self-training iterations. Real-world qualitative results from Sydney RSUs demonstrate the practicality and potential impact for scalable deployment of collaborative perception at intersections.

Abstract

Occlusion presents a significant challenge for safety-critical applications such as autonomous driving. Collaborative perception has recently attracted a large research interest thanks to the ability to enhance the perception of autonomous vehicles via deep information fusion with intelligent roadside units (RSU), thus minimizing the impact of occlusion. While significant advancement has been made, the data-hungry nature of these methods creates a major hurdle for their real-world deployment, particularly due to the need for annotated RSU data. Manually annotating the vast amount of RSU data required for training is prohibitively expensive, given the sheer number of intersections and the effort involved in annotating point clouds. We address this challenge by devising a label-efficient object detection method for RSU based on unsupervised object discovery. Our paper introduces two new modules: one for object discovery based on a spatial-temporal aggregation of point clouds, and another for refinement. Furthermore, we demonstrate that fine-tuning on a small portion of annotated data allows our object discovery models to narrow the performance gap with, or even surpass, fully supervised models. Extensive experiments are carried out in simulated and real-world datasets to evaluate our method.

Label-Efficient 3D Object Detection For Road-Side Units

TL;DR

The paper tackles occlusion challenges in RSU-assisted autonomous perception by proposing a label-efficient autolabeling framework. It combines unsupervised object discovery using spatial-temporal, multi-frame, and multi-scale aggregation with a refinement module, followed by a self-training loop and a fine-tuning stage on a small labeled subset. Experiments on synthetic V2X-Sim and real A9 datasets show that with as few as 100 labeled point clouds for fine-tuning, the method approaches or matches fully supervised performance, with significant gains achieved during the initial self-training iterations. Real-world qualitative results from Sydney RSUs demonstrate the practicality and potential impact for scalable deployment of collaborative perception at intersections.

Abstract

Occlusion presents a significant challenge for safety-critical applications such as autonomous driving. Collaborative perception has recently attracted a large research interest thanks to the ability to enhance the perception of autonomous vehicles via deep information fusion with intelligent roadside units (RSU), thus minimizing the impact of occlusion. While significant advancement has been made, the data-hungry nature of these methods creates a major hurdle for their real-world deployment, particularly due to the need for annotated RSU data. Manually annotating the vast amount of RSU data required for training is prohibitively expensive, given the sheer number of intersections and the effort involved in annotating point clouds. We address this challenge by devising a label-efficient object detection method for RSU based on unsupervised object discovery. Our paper introduces two new modules: one for object discovery based on a spatial-temporal aggregation of point clouds, and another for refinement. Furthermore, we demonstrate that fine-tuning on a small portion of annotated data allows our object discovery models to narrow the performance gap with, or even surpass, fully supervised models. Extensive experiments are carried out in simulated and real-world datasets to evaluate our method.
Paper Structure (22 sections, 4 equations, 6 figures, 4 tables, 2 algorithms)

This paper contains 22 sections, 4 equations, 6 figures, 4 tables, 2 algorithms.

Figures (6)

  • Figure 1: Distribution of points in the proximity of two detections with and without the top face, measured by scale, calculated according to luo2023reward.
  • Figure 2: Overview of our method
  • Figure 3: Comparison of discovered objects using one point cloud, and the concatenation of three point clouds.
  • Figure 4: The evolution of detection performance during self-training.
  • Figure 5: False positive predictions of the self-trained model due to the similar appearance of vehicles and background objects.
  • ...and 1 more figures