Table of Contents
Fetching ...

Vehicle Detection from 3D Lidar Using Fully Convolutional Network

Bo Li, Tianlei Zhang, Tian Xia

TL;DR

This work addresses 3D vehicle detection from LiDAR range scans by projecting the data into a 2D point map and applying a single end-to-end fully convolutional network to predict per-point objectness and 3D bounding boxes. A novel 24D bounding-box encoding, derived from a per-point local coordinate system and rotation-invariant transforms, enables accurate 3D localization using 2D CNNs. The model is trained with balanced, multi-task losses and augmented data, and detections are refined via non-maximum suppression. On the KITTI dataset, the method achieves state-of-the-art or competitive performance in both offline world-space metrics and online evaluations, demonstrating the viability of FCN-based detection on lidar range data.

Abstract

Convolutional network techniques have recently achieved great success in vision based detection tasks. This paper introduces the recent development of our research on transplanting the fully convolutional network technique to the detection tasks on 3D range scan data. Specifically, the scenario is set as the vehicle detection task from the range data of Velodyne 64E lidar. We proposes to present the data in a 2D point map and use a single 2D end-to-end fully convolutional network to predict the objectness confidence and the bounding boxes simultaneously. By carefully design the bounding box encoding, it is able to predict full 3D bounding boxes even using a 2D convolutional network. Experiments on the KITTI dataset shows the state-of-the-art performance of the proposed method.

Vehicle Detection from 3D Lidar Using Fully Convolutional Network

TL;DR

This work addresses 3D vehicle detection from LiDAR range scans by projecting the data into a 2D point map and applying a single end-to-end fully convolutional network to predict per-point objectness and 3D bounding boxes. A novel 24D bounding-box encoding, derived from a per-point local coordinate system and rotation-invariant transforms, enables accurate 3D localization using 2D CNNs. The model is trained with balanced, multi-task losses and augmented data, and detections are refined via non-maximum suppression. On the KITTI dataset, the method achieves state-of-the-art or competitive performance in both offline world-space metrics and online evaluations, demonstrating the viability of FCN-based detection on lidar range data.

Abstract

Convolutional network techniques have recently achieved great success in vision based detection tasks. This paper introduces the recent development of our research on transplanting the fully convolutional network technique to the detection tasks on 3D range scan data. Specifically, the scenario is set as the vehicle detection task from the range data of Velodyne 64E lidar. We proposes to present the data in a 2D point map and use a single 2D end-to-end fully convolutional network to predict the objectness confidence and the bounding boxes simultaneously. By carefully design the bounding box encoding, it is able to predict full 3D bounding boxes even using a 2D convolutional network. Experiments on the KITTI dataset shows the state-of-the-art performance of the proposed method.

Paper Structure

This paper contains 18 sections, 9 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Data visualization generated at different stages of the proposed approach. (a) The input point map, with the $d$ channel visualized. (b) The output confidence map of the objectness branch at $\mathbf{o}^a_\mathbf{p}$. Red denotes for higher confidence. (c) Bounding box candidates corresponding to all points predicted as positive, i.e. high confidence points in (b). (d) Remaining bounding boxes after non-max suppression. Red points are the groundtruth points on vehicles for reference.
  • Figure 2: The proposed FCN structure to predict vehicle objectness and bounding box simultaneously. The output feature map of conv1/deconv5a, conv1/deconv5b and conv2/deconv4 are first concatenated and then ported to their consecutive layers, respectively.
  • Figure 3: (a) Illustration of (\ref{['eq:transform']}). For each vehicle point $\mathbf{p}$, we define a specific coordinate system which is centered at $\mathbf{p}$. The $x$ axis ($\mathbf{r}_x$) of the coordinate system is along with the ray from Velodyne origin to $\mathbf{p}$ (dashed line). (b) An example illustration about the rotation invariance when observing a vehicle. Vehicle A and B have same appearance. See (\ref{['eq:transform']}) in Section \ref{['prediction_encoding']} for details.
  • Figure 4: More examples of the detection results. See Section \ref{['sec:offline']} for details. (a) Detection result on a congested traffic scene. (b) Detection result on far vehicles.
  • Figure 5: Precision-recall curve in the offline evaluation, measured by the world space criterion. See Section \ref{['sec:offline']}.