Table of Contents
Fetching ...

CornerPoint3D: Look at the Nearest Corner Instead of the Center

Ruixiao Zhang, Runwei Guan, Xiangyu Chen, Adam Prugel-Bennett, Xiaohao Cai

TL;DR

The paper addresses the instability of center-based 3D detectors under cross-domain shifts caused by LiDAR occlusion and varying point densities. It introduces two metrics, AP_CS-ABS and AP_CS-BEV, to quantify closer-surfaces detection, and presents EdgeHead as a second-stage refinement to emphasize learning from surfaces near the LiDAR while preserving whole-box quality. It also proposes CornerPoint3D, a nearest-corner detector built on CenterPoint with a nearest-corner heatmap and a Multi-scale Gated Module (MSGM), augmented by EdgeHead to achieve robust cross-domain performance. Across multiple cross-domain tasks (Waymo/nuScenes to KITTI, etc.), CornerPoint3D and EdgeHead deliver substantial improvements in the proposed closer-surfaces metrics while maintaining competitive standard BEV/3D metrics. The work offers a practical pathway to safer and more robust cross-domain 3D object detection by leveraging visible surface information and targeted refinement techniques, compatible with existing domain-adaptation strategies like ROS and SN augmentation.

Abstract

3D object detection aims to predict object centers, dimensions, and rotations from LiDAR point clouds. Despite its simplicity, LiDAR captures only the near side of objects, making center-based detectors prone to poor localization accuracy in cross-domain tasks with varying point distributions. Meanwhile, existing evaluation metrics designed for single-domain assessment also suffer from overfitting due to dataset-specific size variations. A key question arises: Do we really need models to maintain excellent performance in the entire 3D bounding boxes after being applied across domains? Actually, one of our main focuses is on preventing collisions between vehicles and other obstacles, especially in cross-domain scenarios where correctly predicting the sizes is much more difficult. To address these issues, we rethink cross-domain 3D object detection from a practical perspective. We propose two new metrics that evaluate a model's ability to detect objects' closer-surfaces to the LiDAR sensor. Additionally, we introduce EdgeHead, a refinement head that guides models to focus more on learnable closer surfaces, significantly improving cross-domain performance under both our new and traditional BEV/3D metrics. Furthermore, we argue that predicting the nearest corner rather than the object center enhances robustness. We propose a novel 3D object detector, coined as CornerPoint3D, which is built upon CenterPoint and uses heatmaps to supervise the learning and detection of the nearest corner of each object. Our proposed methods realize a balanced trade-off between the detection quality of entire bounding boxes and the locating accuracy of closer surfaces to the LiDAR sensor, outperforming the traditional center-based detector CenterPoint in multiple cross-domain tasks and providing a more practically reasonable and robust cross-domain 3D object detection solution.

CornerPoint3D: Look at the Nearest Corner Instead of the Center

TL;DR

The paper addresses the instability of center-based 3D detectors under cross-domain shifts caused by LiDAR occlusion and varying point densities. It introduces two metrics, AP_CS-ABS and AP_CS-BEV, to quantify closer-surfaces detection, and presents EdgeHead as a second-stage refinement to emphasize learning from surfaces near the LiDAR while preserving whole-box quality. It also proposes CornerPoint3D, a nearest-corner detector built on CenterPoint with a nearest-corner heatmap and a Multi-scale Gated Module (MSGM), augmented by EdgeHead to achieve robust cross-domain performance. Across multiple cross-domain tasks (Waymo/nuScenes to KITTI, etc.), CornerPoint3D and EdgeHead deliver substantial improvements in the proposed closer-surfaces metrics while maintaining competitive standard BEV/3D metrics. The work offers a practical pathway to safer and more robust cross-domain 3D object detection by leveraging visible surface information and targeted refinement techniques, compatible with existing domain-adaptation strategies like ROS and SN augmentation.

Abstract

3D object detection aims to predict object centers, dimensions, and rotations from LiDAR point clouds. Despite its simplicity, LiDAR captures only the near side of objects, making center-based detectors prone to poor localization accuracy in cross-domain tasks with varying point distributions. Meanwhile, existing evaluation metrics designed for single-domain assessment also suffer from overfitting due to dataset-specific size variations. A key question arises: Do we really need models to maintain excellent performance in the entire 3D bounding boxes after being applied across domains? Actually, one of our main focuses is on preventing collisions between vehicles and other obstacles, especially in cross-domain scenarios where correctly predicting the sizes is much more difficult. To address these issues, we rethink cross-domain 3D object detection from a practical perspective. We propose two new metrics that evaluate a model's ability to detect objects' closer-surfaces to the LiDAR sensor. Additionally, we introduce EdgeHead, a refinement head that guides models to focus more on learnable closer surfaces, significantly improving cross-domain performance under both our new and traditional BEV/3D metrics. Furthermore, we argue that predicting the nearest corner rather than the object center enhances robustness. We propose a novel 3D object detector, coined as CornerPoint3D, which is built upon CenterPoint and uses heatmaps to supervise the learning and detection of the nearest corner of each object. Our proposed methods realize a balanced trade-off between the detection quality of entire bounding boxes and the locating accuracy of closer surfaces to the LiDAR sensor, outperforming the traditional center-based detector CenterPoint in multiple cross-domain tasks and providing a more practically reasonable and robust cross-domain 3D object detection solution.

Paper Structure

This paper contains 34 sections, 13 equations, 10 figures, 9 tables.

Figures (10)

  • Figure 1: Prediction illustrations of different detectors in cross-domain tasks. The left and right columns showcase the prediction properties when the training domain respectively has a larger and smaller average object size than the target domain. (a) Traditional center-based methods predict the object centers without available point cloud data surrounding them, often resulting in overfitting and guessing about the center location based on average object sizes in training data. (b) Our proposed refinement head, EdgeHead, improves the detection of surfaces closer to the ego vehicle, mitigating the impact of size overfitting in occlusion avoidance. (c) Our proposed novel detector based on the nearest corner prediction, enabling more robust detection in cross-domain 3D object detection tasks.
  • Figure 2: Illustration of different regression processes. (a) The process that directly regresses the closest vertex and rotations without rotating the anchor box first. (b) The prediction obtained using the process in (a), in which the red arrow shows that the prediction does not learn the closest vertex as expected. (c) The regression process guided by Eq. \ref{['eq:final_reg_loss']} and Eq. \ref{['eq:reg_loss_new']} in our EdgeHead, which first rotates the anchor by $\theta_{\rm gt}$ and then calculates the regression target of $x$ and $y$ locations.
  • Figure 3: Overview of the CornerPoint3D. Standard 3D and 2D backbones are firstly applied to obtain voxelized 3D and 2D BEV (bird’s-eye view) features from the LiDAR point cloud data. Afterwards, the BEV features are fed into the MSGM (multi-scale gated module) to extract adaptive features, which is especially essential in cross-domain 3D object detection tasks. Then, the shared multi-scale features are fed into separate 2D CNN architecture detection heads to predict the heatmaps of the objects’ nearest corners (to the LiDAR sensor) and other properties of the entire 3D bounding boxes. Afterwards, the EdgeHead zhang2024detect could also be utilized for second-stage refinement.
  • Figure 4: Illustration of the MSGM (multi-scale gated module). In the upper branch, backbone features are processed through a global average pooling layer, followed by two fully connected layers and a softmax function, generating gated weights for the multi-scale convolution features. In the lower branch, three convolutions with different kernel sizes are applied to capture features at multiple receptive fields. The outputs of these convolutions are then combined according to the corresponding gated weights.
  • Figure 5: Different conditions of the corner-box relationship. (a) Given a center point, it will only belong to one box. (b) Given a corner point, it may belong to four boxes. (c) By limiting the boxes to those where the point is the nearest one among the four corners, two box candidates still exist in some cases. (d) We utilize an additional separate head to predict the relative position vector between the nearest corner and the center of the same box, i.e., extra guidance is provided to ensure accurate box selection.
  • ...and 5 more figures