From Points to Parts: 3D Object Detection from Point Cloud with Part-aware and Part-aggregation Network
Shaoshuai Shi, Zhe Wang, Jianping Shi, Xiaogang Wang, Hongsheng Li
TL;DR
This work tackles robust 3D object detection from LiDAR point clouds by introducing Part-$A^2$Net, a two-stage framework that first uses part-aware learning to predict intra-object part locations and generate proposals, and then aggregates these part cues via RoI-aware pooling and sparse convolution to score and refine boxes. It jointly optimizes foreground segmentation, part-location estimation, and proposal refinement with an end-to-end loss, and offers both anchor-free and anchor-based proposal strategies. The approach yields state-of-the-art KITTI results using only LiDAR data, supported by comprehensive ablations that validate each component's contribution. The framework provides a geometry-aware, end-to-end solution that can extend to other 3D detection tasks and settings, with notable efficiency due to sparse convolution and differentiable RoI pooling.
Abstract
3D object detection from LiDAR point cloud is a challenging problem in 3D scene understanding and has many practical applications. In this paper, we extend our preliminary work PointRCNN to a novel and strong point-cloud-based 3D object detection framework, the part-aware and aggregation neural network (Part-$A^2$ net). The whole framework consists of the part-aware stage and the part-aggregation stage. Firstly, the part-aware stage for the first time fully utilizes free-of-charge part supervisions derived from 3D ground-truth boxes to simultaneously predict high quality 3D proposals and accurate intra-object part locations. The predicted intra-object part locations within the same proposal are grouped by our new-designed RoI-aware point cloud pooling module, which results in an effective representation to encode the geometry-specific features of each 3D proposal. Then the part-aggregation stage learns to re-score the box and refine the box location by exploring the spatial relationship of the pooled intra-object part locations. Extensive experiments are conducted to demonstrate the performance improvements from each component of our proposed framework. Our Part-$A^2$ net outperforms all existing 3D detection methods and achieves new state-of-the-art on KITTI 3D object detection dataset by utilizing only the LiDAR point cloud data. Code is available at https://github.com/sshaoshuai/PointCloudDet3D.
