Table of Contents
Fetching ...

GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D Object Detection

Yan Lu, Xinzhu Ma, Lei Yang, Tianzhu Zhang, Yating Liu, Qi Chu, Tong He, Yonghui Li, Wanli Ouyang

TL;DR

GUPNet++ tackles the fundamental issue of error amplification in geometry-based monocular 3D detection by modeling the perspective projection as a probabilistic process. Depth is represented as a distribution derived from 2D and 3D height uncertainties, yielding a geometry-guided depth uncertainty that informs training and inference. The approach introduces a beta-NLL Laplacian loss for height and depth, an IoU-guided Uncertainty-Confidence mechanism to translate depth uncertainty into detection scores, and a streamlined uncertainty-based optimization that eliminates curriculum learning. Extensive experiments on KITTI and nuScenes demonstrate state-of-the-art performance with improved reliability and efficiency, highlighting the practical impact of uncertainty propagation in monocular 3D perception.

Abstract

Geometry plays a significant role in monocular 3D object detection. It can be used to estimate object depth by using the perspective projection between object's physical size and 2D projection in the image plane, which can introduce mathematical priors into deep models. However, this projection process also introduces error amplification, where the error of the estimated height is amplified and reflected into the projected depth. It leads to unreliable depth inferences and also impairs training stability. To tackle this problem, we propose a novel Geometry Uncertainty Propagation Network (GUPNet++) by modeling geometry projection in a probabilistic manner. This ensures depth predictions are well-bounded and associated with a reasonable uncertainty. The significance of introducing such geometric uncertainty is two-fold: (1). It models the uncertainty propagation relationship of the geometry projection during training, improving the stability and efficiency of the end-to-end model learning. (2). It can be derived to a highly reliable confidence to indicate the quality of the 3D detection result, enabling more reliable detection inference. Experiments show that the proposed approach not only obtains (state-of-the-art) SOTA performance in image-based monocular 3D detection but also demonstrates superiority in efficacy with a simplified framework.

GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D Object Detection

TL;DR

GUPNet++ tackles the fundamental issue of error amplification in geometry-based monocular 3D detection by modeling the perspective projection as a probabilistic process. Depth is represented as a distribution derived from 2D and 3D height uncertainties, yielding a geometry-guided depth uncertainty that informs training and inference. The approach introduces a beta-NLL Laplacian loss for height and depth, an IoU-guided Uncertainty-Confidence mechanism to translate depth uncertainty into detection scores, and a streamlined uncertainty-based optimization that eliminates curriculum learning. Extensive experiments on KITTI and nuScenes demonstrate state-of-the-art performance with improved reliability and efficiency, highlighting the practical impact of uncertainty propagation in monocular 3D perception.

Abstract

Geometry plays a significant role in monocular 3D object detection. It can be used to estimate object depth by using the perspective projection between object's physical size and 2D projection in the image plane, which can introduce mathematical priors into deep models. However, this projection process also introduces error amplification, where the error of the estimated height is amplified and reflected into the projected depth. It leads to unreliable depth inferences and also impairs training stability. To tackle this problem, we propose a novel Geometry Uncertainty Propagation Network (GUPNet++) by modeling geometry projection in a probabilistic manner. This ensures depth predictions are well-bounded and associated with a reasonable uncertainty. The significance of introducing such geometric uncertainty is two-fold: (1). It models the uncertainty propagation relationship of the geometry projection during training, improving the stability and efficiency of the end-to-end model learning. (2). It can be derived to a highly reliable confidence to indicate the quality of the 3D detection result, enabling more reliable detection inference. Experiments show that the proposed approach not only obtains (state-of-the-art) SOTA performance in image-based monocular 3D detection but also demonstrates superiority in efficacy with a simplified framework.
Paper Structure (34 sections, 41 equations, 14 figures, 8 tables)

This paper contains 34 sections, 41 equations, 14 figures, 8 tables.

Figures (14)

  • Figure 1: Visualized examples of depth shift caused by $\pm$0.1m 3D height jitter. We draw some bird's view examples to show the error amplification effect. The unit of the horizontal axis and the vertical axis are both meters, and the vertical axis corresponds to the depth direction. The green boxes mean the original projection outputs. The blue and red boxes are shifted boxes caused by +0.1m and -0.1m 3D height bias respectively (best viewed in color).
  • Figure 2: The main pipeline of our Geometry Uncertainty Propagation module. The projection process is modeled by the uncertainty theory in the probability framework. The inference depths can be represented as a distribution to provide both accurate values and scores.
  • Figure 3: The framework of the GUPNet++. The input image is processed by the network to extract the 2D box and basic 3D box parameters. The Geometry Uncertainty Propagation module estimates the depth using the height parameters, helping both training and inference.
  • Figure 4: The computation pipeline of the IoU-guided Uncertainty-Confidence: The dashed line box means the furthest potential true box that has a 0.7 IoU value with our predicted box (the solid line one). Under that, the square of the orange region under the depth distribution curve means the confidence of the predicted box $B^P$.
  • Figure 5: The visualized uncertainty examples on the validation set. The first row (Blue boxes) are the results of our GUPNet++ while the second row (Yellow boxes) are the GUPNet results. And the third row (Red boxes) are the baseline results. The 4rd row shows the bird-view results (Green means the ground truth boxes). The IoU means the Intersection-over-Union between the predicted box and the corresponding ground-truth one and the uncertainty is the depth uncertainty $\sigma_d$ (best viewed in color.).
  • ...and 9 more figures