Table of Contents
Fetching ...

Commonsense Prototype for Outdoor Unsupervised 3D Object Detection

Hai Wu, Shijia Zhao, Xun Huang, Chenglu Wen, Xin Li, Cheng Wang

TL;DR

This paper introduces a Commonsense Prototype-based Detector, termed CPD, for unsupervised 3D object de-tection that outper-forms state-of-the-art unsupervised 3D detectors on Waymo Open Dataset (WOD), PandaSet, and KITTI datasets by a large margin.

Abstract

The prevalent approaches of unsupervised 3D object detection follow cluster-based pseudo-label generation and iterative self-training processes. However, the challenge arises due to the sparsity of LiDAR scans, which leads to pseudo-labels with erroneous size and position, resulting in subpar detection performance. To tackle this problem, this paper introduces a Commonsense Prototype-based Detector, termed CPD, for unsupervised 3D object detection. CPD first constructs Commonsense Prototype (CProto) characterized by high-quality bounding box and dense points, based on commonsense intuition. Subsequently, CPD refines the low-quality pseudo-labels by leveraging the size prior from CProto. Furthermore, CPD enhances the detection accuracy of sparsely scanned objects by the geometric knowledge from CProto. CPD outperforms state-of-the-art unsupervised 3D detectors on Waymo Open Dataset (WOD), PandaSet, and KITTI datasets by a large margin. Besides, by training CPD on WOD and testing on KITTI, CPD attains 90.85% and 81.01% 3D Average Precision on easy and moderate car classes, respectively. These achievements position CPD in close proximity to fully supervised detectors, highlighting the significance of our method. The code will be available at https://github.com/hailanyi/CPD.

Commonsense Prototype for Outdoor Unsupervised 3D Object Detection

TL;DR

This paper introduces a Commonsense Prototype-based Detector, termed CPD, for unsupervised 3D object de-tection that outper-forms state-of-the-art unsupervised 3D detectors on Waymo Open Dataset (WOD), PandaSet, and KITTI datasets by a large margin.

Abstract

The prevalent approaches of unsupervised 3D object detection follow cluster-based pseudo-label generation and iterative self-training processes. However, the challenge arises due to the sparsity of LiDAR scans, which leads to pseudo-labels with erroneous size and position, resulting in subpar detection performance. To tackle this problem, this paper introduces a Commonsense Prototype-based Detector, termed CPD, for unsupervised 3D object detection. CPD first constructs Commonsense Prototype (CProto) characterized by high-quality bounding box and dense points, based on commonsense intuition. Subsequently, CPD refines the low-quality pseudo-labels by leveraging the size prior from CProto. Furthermore, CPD enhances the detection accuracy of sparsely scanned objects by the geometric knowledge from CProto. CPD outperforms state-of-the-art unsupervised 3D detectors on Waymo Open Dataset (WOD), PandaSet, and KITTI datasets by a large margin. Besides, by training CPD on WOD and testing on KITTI, CPD attains 90.85% and 81.01% 3D Average Precision on easy and moderate car classes, respectively. These achievements position CPD in close proximity to fully supervised detectors, highlighting the significance of our method. The code will be available at https://github.com/hailanyi/CPD.
Paper Structure (17 sections, 8 equations, 12 figures, 12 tables)

This paper contains 17 sections, 8 equations, 12 figures, 12 tables.

Figures (12)

  • Figure 1: Illustration of commonsense prototypes for unsupervised 3D object detection in autonomous driving scenes.
  • Figure 2: Illustration and statistics of complete and incomplete objects on WOD Waymo validation set (large enough to demonstrate the general problem). (a) Pseudo-labels of complete object $\mathcal{T}$ are refined by temporal consistency. (b) Pseudo-labels of incomplete object $\mathcal{J}$ fail to be refined by temporal consistency. (c) 65% objects lack full scan coverage and generate inaccurate pseudo-labels ( Max IoU (Intersection over Union) $<$ 0.5 with GT (Ground Truth)). (d) The vehicle GT of complete object $GT^\mathcal{T}$ and incomplete object $GT^\mathcal{J}$ have similar size distributions. (e) The pseudo-label of complete object $Pse^\mathcal{T}$ and incomplete object $Pse^\mathcal{J}$ have different size distributions. (f)(g) The nearby stationary objects are with high completeness in consecutive frames.
  • Figure 3: CPD framework. (a) Initial pseudo-labels are generated by multi-frame clustering. (b) The commonsense prototype (CProto) is constructed from high-quality pseudo-labels based on CSS score. The low-quality labels are further refined by the shape prior from CProto. (c) A prototype network fed with dense points from CProto produces high-quality features to guide the detection network convergence.
  • Figure 4: (a) Length absolute error with different frames. (b) Multi-level occupancy score. (c) Mean size error of initial labels.
  • Figure 5: Completeness and size similarity scoring.
  • ...and 7 more figures