Harnessing Uncertainty-aware Bounding Boxes for Unsupervised 3D Object Detection

Ruiyang Zhang; Hu Zhang; Hang Yu; Zhedong Zheng

Harnessing Uncertainty-aware Bounding Boxes for Unsupervised 3D Object Detection

Ruiyang Zhang, Hu Zhang, Hang Yu, Zhedong Zheng

TL;DR

This work tackles the noise in pseudo bounding boxes used for unsupervised 3D object detection by introducing UA3D, a two-phase framework that first estimates coordinate-level uncertainty via a secondary auxiliary detector and then regularizes training by reweighting each bbox coordinate according to its uncertainty. The uncertainty is computed from dense, per-coordinate predictions across x, y, z, length, width, height, and orientation, enabling fine-grained mitigation of noisy labels through the loss terms $\mathcal{L}_{p}^{u}$ and $\mathcal{L}_{a}^{u}$ and the total loss $\mathcal{L}_{total}=\mathcal{L}_{p}^{u}+\mu\cdot\mathcal{L}_{a}^{u}$. Empirically, UA3D yields substantial improvements over prior unsupervised methods on nuScenes and Lyft, particularly for long-range objects, and ablations confirm the advantages of coordinate-level uncertainty, an appropriately sized auxiliary detector (with $\gamma=0.5$), and the stabilizing regularization coefficient $\lambda=1e{-5}$. The approach offers a practical, learnable mechanism to reduce the adverse effects of pseudo box noise and can be integrated with standard 3D detectors to enhance unsupervised learning in LiDAR-based perception."

Abstract

Unsupervised 3D object detection aims to identify objects of interest from unlabeled raw data, such as LiDAR points. Recent approaches usually adopt pseudo 3D bounding boxes (3D bboxes) from clustering algorithm to initialize the model training. However, pseudo bboxes inevitably contain noise, and such inaccuracies accumulate to the final model, compromising the performance. Therefore, in an attempt to mitigate the negative impact of inaccurate pseudo bboxes, we introduce a new uncertainty-aware framework for unsupervised 3D object detection, dubbed UA3D. In particular, our method consists of two phases: uncertainty estimation and uncertainty regularization. (1) In the uncertainty estimation phase, we incorporate an extra auxiliary detection branch alongside the original primary detector. The prediction disparity between the primary and auxiliary detectors could reflect fine-grained uncertainty at the box coordinate level. (2) Based on the assessed uncertainty, we adaptively adjust the weight of every 3D bbox coordinate via uncertainty regularization, refining the training process on pseudo bboxes. For pseudo bbox coordinate with high uncertainty, we assign a relatively low loss weight. Extensive experiments verify that the proposed method is robust against the noisy pseudo bboxes, yielding substantial improvements on nuScenes and Lyft compared to existing approaches, with increases of +6.9% AP$_{BEV}$ and +2.5% AP$_{3D}$ on nuScenes, and +4.1% AP$_{BEV}$ and +2.0% AP$_{3D}$ on Lyft.

Harnessing Uncertainty-aware Bounding Boxes for Unsupervised 3D Object Detection

TL;DR

and

and the total loss

. Empirically, UA3D yields substantial improvements over prior unsupervised methods on nuScenes and Lyft, particularly for long-range objects, and ablations confirm the advantages of coordinate-level uncertainty, an appropriately sized auxiliary detector (with

), and the stabilizing regularization coefficient

. The approach offers a practical, learnable mechanism to reduce the adverse effects of pseudo box noise and can be integrated with standard 3D detectors to enhance unsupervised learning in LiDAR-based perception."

Abstract

and +2.5% AP

on nuScenes, and +4.1% AP

and +2.0% AP

on Lyft.

Paper Structure (17 sections, 3 equations, 6 figures, 5 tables)

This paper contains 17 sections, 3 equations, 6 figures, 5 tables.

Introduction
Related Works
Method
Uncertainty Estimation
Uncertainty Regularization
Experiment
Settings
Comparison with State-of-the-art Methods
Ablation Studies and Further Discussion
Qualitative Analysis
Conclusion
Appendix
Implementation Details
Model Structure
More qualitative results
...and 2 more sections

Figures (6)

Figure 1: Our motivation. Pseudo boxes generated by clustering-based algorithms often contain noise (comparing (a) and (b)). Previous methods you2022learningzhang2023towards directly utilize those noisy pseudo boxes to train detection model, leading to suboptimal performance (see (c)). In contrast, we introduce uncertainty-aware pseudo boxes by assigning coordinate-level uncertainty. High uncertainty is assigned to inaccurate coordinates, and during training, the weights of these uncertain coordinates are adaptively reduced. This approach mitigates the negative impact of noisy pseudo boxes, yielding robust detection (comparing (c) and (d)).
Figure 2: Overall pipeline. Given an input point cloud, an auxiliary detector predicts the bounding boxes $\boldsymbol{\hat{B}_a}$ concurrently with the primary detector predictions $\boldsymbol{\hat{B}_p}$. We leverage the discrepancy between the two detector predictions as the uncertainty indicator $\boldsymbol{U}$. Specifically, high coordinate-level uncertainty is assigned to inaccurate pseudo box coordinates. For uncertainty regularization, the original detection loss is rectified by the estimated uncertainty as $\mathcal{L}_{p}^{u}$ and $\mathcal{L}_{a}^{u}$, reducing the weight of inaccurate pseudo boxes on coordinate level. Note: SA refers to Set Abstraction, and FP refers to Feature Propagation. We insert auxiliary detector after sa_layer_4 in PointRCNN backbone. For uncertainty visualization, purple box represents the uncertainty of length, width, and height, i.e., $\boldsymbol{\Delta_{l}}$, $\boldsymbol{\Delta_{w}}$, and $\boldsymbol{\Delta_{h}}$; purple orthogonal lines indicate the uncertainty of the x, y, and z positions, i.e., $\boldsymbol{\Delta_{x}}$, $\boldsymbol{\Delta_{y}}$, and $\boldsymbol{\Delta_{z}}$; and purple diagonal line denotes the uncertainty of orientation, i.e., $\boldsymbol{\Delta_{\theta}}$. We present a detailed explanation of our uncertainty visualization scheme in Fig. \ref{['fig:Explaination']}. In this example, orientation of pseudo box on the right is inaccurate. Our method assigns high uncertainty for the orientation and reduces its weight during model training.
Figure 3: Correspondence between pseudo label inaccuracy and high uncertainty. (a) We present ground truth and pseudo boxes in two different point clouds (left and right columns). Each point cloud contains both accurate and inaccurate pseudo boxes. We observe that pseudo boxes can be significantly inaccurate in terms of the shape, location, and rotation. Direct usage of these boxes for training can easily impair the performance of the detection model. (b) We present the predictions from the primary and auxiliary detectors. Two detector predictions align closely for objects with accurate pseudo boxes but diverge for those with inaccurate ones. The mismatch between inaccurate pseudo boxes and the actual point cloud distribution can confuse the model, resulting in varying interpretations. (c) We present our uncertainty-aware pseudo boxes. Fine-grained coordinate-level uncertainty is estimated, e.g., the orientation uncertainty for the right object (in left column) is high (as indicated by the long purple diagonal line), due to its inaccuracy in the pseudo box. The colors follow the same conventions in Fig. \ref{['fig:Method']}. A detail explanation of our uncertainty visualization scheme is shown in Fig. \ref{['fig:Explaination']}.
Figure 4: Visualization comparison between different methods. We compare the predictions of MODEST you2022learning, OYSTER zhang2023towards, and our uncertainty-aware framework. Green boxes denote ground truth boxes and red boxes are predictions. (a) Generally, our method shows a clear improvement in box coordinate accuracy over previous methods. (b) For some challenging objects with few points or far away, our method can still retain a higher recall rate.
Figure 5: Further qualitative comparison between different methods. We compare our uncertainty-aware framework with previous works, e.g., MODEST and OYSTER. Green boxes denote the ground-truth and red boxes represent predictions from the detection model. (a) Our uncertainty-aware framework shows more accurate perceptions of various foreground objects. (b) In challenging scenarios, such as distant objects with sparse point clouds or small objects, our method achieves a higher recall rate.
...and 1 more figures

Harnessing Uncertainty-aware Bounding Boxes for Unsupervised 3D Object Detection

TL;DR

Abstract

Harnessing Uncertainty-aware Bounding Boxes for Unsupervised 3D Object Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (6)