Table of Contents
Fetching ...

Rethinking the Encoding and Annotating of 3D Bounding Box: Corner-Aware 3D Object Detection from Point Clouds

Qinghao Meng, Junbo Yin, Jianbing Shen, Yunde Jia

TL;DR

This work tackles the instability of center-aligned regression in LiDAR-based 3D detection by introducing corner-aligned regression, leveraging dense BEV corner observations to improve geometric consistency. It systematically analyzes five corner-encoding schemes, identifying full-corner encoding as the most robust, and presents a two-stage corner-aware detector that can operate under full or partial supervision using BEV corner annotations and height priors from 2D detections. A practical corner-click annotation protocol and a weak-to-full learning strategy enable recovery of complete 3D boxes from partial signals, including height, with geometric constraints guiding recovery. On KITTI, the method achieves a 3D AP improvement of about 3.4 points over a center-based baseline and reaches approximately 83% of fully supervised accuracy using only BEV corner annotations, underscoring the practicality and scalability of corner-aware regression for 3D detection.

Abstract

Center-aligned regression remains dominant in LiDAR-based 3D object detection, yet it suffers from fundamental instability: object centers often fall in sparse or empty regions of the bird's-eye-view (BEV) due to the front-surface-biased nature of LiDAR point clouds, leading to noisy and inaccurate bounding box predictions. To circumvent this limitation, we revisit bounding box representation and propose corner-aligned regression, which shifts the prediction target from unstable centers to geometrically informative corners that reside in dense, observable regions. Leveraging the inherent geometric constraints among corners and image 2D boxes, partial parameters of 3D bounding boxes can be recovered from corner annotations, enabling a weakly supervised paradigm without requiring complete 3D labels. We design a simple yet effective corner-aware detection head that can be plugged into existing detectors. Experiments on KITTI show our method improves performance by 3.5% AP over center-based baseline, and achieves 83% of fully supervised accuracy using only BEV corner clicks, demonstrating the effectiveness of our corner-aware regression strategy.

Rethinking the Encoding and Annotating of 3D Bounding Box: Corner-Aware 3D Object Detection from Point Clouds

TL;DR

This work tackles the instability of center-aligned regression in LiDAR-based 3D detection by introducing corner-aligned regression, leveraging dense BEV corner observations to improve geometric consistency. It systematically analyzes five corner-encoding schemes, identifying full-corner encoding as the most robust, and presents a two-stage corner-aware detector that can operate under full or partial supervision using BEV corner annotations and height priors from 2D detections. A practical corner-click annotation protocol and a weak-to-full learning strategy enable recovery of complete 3D boxes from partial signals, including height, with geometric constraints guiding recovery. On KITTI, the method achieves a 3D AP improvement of about 3.4 points over a center-based baseline and reaches approximately 83% of fully supervised accuracy using only BEV corner annotations, underscoring the practicality and scalability of corner-aware regression for 3D detection.

Abstract

Center-aligned regression remains dominant in LiDAR-based 3D object detection, yet it suffers from fundamental instability: object centers often fall in sparse or empty regions of the bird's-eye-view (BEV) due to the front-surface-biased nature of LiDAR point clouds, leading to noisy and inaccurate bounding box predictions. To circumvent this limitation, we revisit bounding box representation and propose corner-aligned regression, which shifts the prediction target from unstable centers to geometrically informative corners that reside in dense, observable regions. Leveraging the inherent geometric constraints among corners and image 2D boxes, partial parameters of 3D bounding boxes can be recovered from corner annotations, enabling a weakly supervised paradigm without requiring complete 3D labels. We design a simple yet effective corner-aware detection head that can be plugged into existing detectors. Experiments on KITTI show our method improves performance by 3.5% AP over center-based baseline, and achieves 83% of fully supervised accuracy using only BEV corner clicks, demonstrating the effectiveness of our corner-aware regression strategy.

Paper Structure

This paper contains 13 sections, 12 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Comparisons of 3D bounding box proposal prediction errors (x, y, z, length, width, height, yaw-angle) between baseline ( PointRCNN Shi_2019_CVPR) and the proposed method (a-b), the visualization of point cloud scene between baseline and the one with our improvement (c-d), and the comparisons of location error (e-f).
  • Figure 2: Impact of prediction errors on IoU for different object detection formulations.
  • Figure 3: Five different corner encoding schemes.
  • Figure 4: The proposed corner-aware 3D object detection. While the inference route is represented with green arrow for the first stage (stage-1) and blue arrow for the second stage (stage-2). The fully supervised learning part with corner-encoded 3D bounding box label is shown with red arrow. The weakly supervised learning part with raw corner label and 2D geometrical constrain is shown with brown arrow and shallow blue arrow which will be detailed later in Sec.\ref{['subs:access_corner']}. The framework can be trained with either full 3D bounding box annotation and weakly-supervised corner annotation, respectively. Thus, they have different Foreground (left for Fully Supervised Learning (FSL), right for Weakly Supervised Learning (WSL), and different loss functions. The height information of WSL comes from geometrical constrain between corner annotation and 2D image box supervision,
  • Figure 5: The example of two erasing region strategies. The small box region strategy is on the left, and the cross region strategy is on the right. The region regarded as background is colored with shallow blue. We mark the points that locate inside ground-truth boxes and outside the erasing region as foreground, which are colored orange, and the rest of points inside box are earsed points and are colored blue with background points . The left one is small box region, and the right one is cross region.