Table of Contents
Fetching ...

From Points to Parts: 3D Object Detection from Point Cloud with Part-aware and Part-aggregation Network

Shaoshuai Shi, Zhe Wang, Jianping Shi, Xiaogang Wang, Hongsheng Li

TL;DR

This work tackles robust 3D object detection from LiDAR point clouds by introducing Part-$A^2$Net, a two-stage framework that first uses part-aware learning to predict intra-object part locations and generate proposals, and then aggregates these part cues via RoI-aware pooling and sparse convolution to score and refine boxes. It jointly optimizes foreground segmentation, part-location estimation, and proposal refinement with an end-to-end loss, and offers both anchor-free and anchor-based proposal strategies. The approach yields state-of-the-art KITTI results using only LiDAR data, supported by comprehensive ablations that validate each component's contribution. The framework provides a geometry-aware, end-to-end solution that can extend to other 3D detection tasks and settings, with notable efficiency due to sparse convolution and differentiable RoI pooling.

Abstract

3D object detection from LiDAR point cloud is a challenging problem in 3D scene understanding and has many practical applications. In this paper, we extend our preliminary work PointRCNN to a novel and strong point-cloud-based 3D object detection framework, the part-aware and aggregation neural network (Part-$A^2$ net). The whole framework consists of the part-aware stage and the part-aggregation stage. Firstly, the part-aware stage for the first time fully utilizes free-of-charge part supervisions derived from 3D ground-truth boxes to simultaneously predict high quality 3D proposals and accurate intra-object part locations. The predicted intra-object part locations within the same proposal are grouped by our new-designed RoI-aware point cloud pooling module, which results in an effective representation to encode the geometry-specific features of each 3D proposal. Then the part-aggregation stage learns to re-score the box and refine the box location by exploring the spatial relationship of the pooled intra-object part locations. Extensive experiments are conducted to demonstrate the performance improvements from each component of our proposed framework. Our Part-$A^2$ net outperforms all existing 3D detection methods and achieves new state-of-the-art on KITTI 3D object detection dataset by utilizing only the LiDAR point cloud data. Code is available at https://github.com/sshaoshuai/PointCloudDet3D.

From Points to Parts: 3D Object Detection from Point Cloud with Part-aware and Part-aggregation Network

TL;DR

This work tackles robust 3D object detection from LiDAR point clouds by introducing Part-Net, a two-stage framework that first uses part-aware learning to predict intra-object part locations and generate proposals, and then aggregates these part cues via RoI-aware pooling and sparse convolution to score and refine boxes. It jointly optimizes foreground segmentation, part-location estimation, and proposal refinement with an end-to-end loss, and offers both anchor-free and anchor-based proposal strategies. The approach yields state-of-the-art KITTI results using only LiDAR data, supported by comprehensive ablations that validate each component's contribution. The framework provides a geometry-aware, end-to-end solution that can extend to other 3D detection tasks and settings, with notable efficiency due to sparse convolution and differentiable RoI pooling.

Abstract

3D object detection from LiDAR point cloud is a challenging problem in 3D scene understanding and has many practical applications. In this paper, we extend our preliminary work PointRCNN to a novel and strong point-cloud-based 3D object detection framework, the part-aware and aggregation neural network (Part- net). The whole framework consists of the part-aware stage and the part-aggregation stage. Firstly, the part-aware stage for the first time fully utilizes free-of-charge part supervisions derived from 3D ground-truth boxes to simultaneously predict high quality 3D proposals and accurate intra-object part locations. The predicted intra-object part locations within the same proposal are grouped by our new-designed RoI-aware point cloud pooling module, which results in an effective representation to encode the geometry-specific features of each 3D proposal. Then the part-aggregation stage learns to re-score the box and refine the box location by exploring the spatial relationship of the pooled intra-object part locations. Extensive experiments are conducted to demonstrate the performance improvements from each component of our proposed framework. Our Part- net outperforms all existing 3D detection methods and achieves new state-of-the-art on KITTI 3D object detection dataset by utilizing only the LiDAR point cloud data. Code is available at https://github.com/sshaoshuai/PointCloudDet3D.

Paper Structure

This paper contains 26 sections, 17 equations, 11 figures, 15 tables.

Figures (11)

  • Figure 1: Our proposed part-aware and aggregation network can accurately predict intra-object part locations even when objects are partially occluded. Such part locations can assist accurate 3D object detection. The predicted intra-object part locations by our proposed method are visualized by interpolated colors of eight corners. Best viewed in colors.
  • Figure 2: The overall framework of our part-aware and aggregation neural network for 3D object detection. It consists of two stages: (a) the part-aware stage-I for the first time predicts intra-object part locations and generates 3D proposals by feeding the point cloud to our encoder-decoder network. (b) The part-aggregation stage-II conducts the proposed RoI-aware point cloud pooling operation to aggregate the part information from each 3D proposal, then the part-aggregation network is utilized to score boxes and refine locations based on the part features and information from stage-I.
  • Figure 3: Comparison of voxelized point cloud and raw point cloud in autonomous driving scenarios. The center of each non-empty voxel is considered as a point to form the voxelized point cloud. The voxelized point cloud is approximately equivalent to the raw point cloud and 3D shapes of 3D objects are well kept for 3D object detection.
  • Figure 4: Illustration of intra-object part locations for foreground points. Here we use interpolated colors to indicate the intra-object part location of each point. Best viewed in colors.
  • Figure 5: Illustration of bin-based center localization. The surrounding area along $X$ and $Y$ axes of each foreground point is split into a series of bins to locate the object center.
  • ...and 6 more figures