Table of Contents
Fetching ...

Pseudo-LiDAR++: Accurate Depth for 3D Object Detection in Autonomous Driving

Yurong You, Yan Wang, Wei-Lun Chao, Divyansh Garg, Geoff Pleiss, Bharath Hariharan, Mark Campbell, Kilian Q. Weinberger

TL;DR

The paper tackles the cost-depth trade-off in autonomous-driving 3D object detection by addressing depth estimation biases in stereo-based pseudo-LiDAR. It introduces a Stereo Depth Network (SDN) that learns direct depth using a depth-cost volume and a depth loss, plus a Graph-based Depth Correction (GDC) that fuses sparse LiDAR measurements with dense stereo depth through landmark-guided diffusion. On KITTI, SDN improves depth accuracy and BEV/3D detection, and GDC further enhances performance, with pseudo-LiDAR++ (PL++: SDN + GDC) approaching 64-beam LiDAR performance while using only 4 beams and stereo cameras. The approach promises substantial cost reductions and real-time feasibility, demonstrating that combining dense stereo depth with sparse high-precision measurements can rival expensive LiDAR-only systems.

Abstract

Detecting objects such as cars and pedestrians in 3D plays an indispensable role in autonomous driving. Existing approaches largely rely on expensive LiDAR sensors for accurate depth information. While recently pseudo-LiDAR has been introduced as a promising alternative, at a much lower cost based solely on stereo images, there is still a notable performance gap. In this paper we provide substantial advances to the pseudo-LiDAR framework through improvements in stereo depth estimation. Concretely, we adapt the stereo network architecture and loss function to be more aligned with accurate depth estimation of faraway objects --- currently the primary weakness of pseudo-LiDAR. Further, we explore the idea to leverage cheaper but extremely sparse LiDAR sensors, which alone provide insufficient information for 3D detection, to de-bias our depth estimation. We propose a depth-propagation algorithm, guided by the initial depth estimates, to diffuse these few exact measurements across the entire depth map. We show on the KITTI object detection benchmark that our combined approach yields substantial improvements in depth estimation and stereo-based 3D object detection --- outperforming the previous state-of-the-art detection accuracy for faraway objects by 40%. Our code is available at https://github.com/mileyan/Pseudo_Lidar_V2.

Pseudo-LiDAR++: Accurate Depth for 3D Object Detection in Autonomous Driving

TL;DR

The paper tackles the cost-depth trade-off in autonomous-driving 3D object detection by addressing depth estimation biases in stereo-based pseudo-LiDAR. It introduces a Stereo Depth Network (SDN) that learns direct depth using a depth-cost volume and a depth loss, plus a Graph-based Depth Correction (GDC) that fuses sparse LiDAR measurements with dense stereo depth through landmark-guided diffusion. On KITTI, SDN improves depth accuracy and BEV/3D detection, and GDC further enhances performance, with pseudo-LiDAR++ (PL++: SDN + GDC) approaching 64-beam LiDAR performance while using only 4 beams and stereo cameras. The approach promises substantial cost reductions and real-time feasibility, demonstrating that combining dense stereo depth with sparse high-precision measurements can rival expensive LiDAR-only systems.

Abstract

Detecting objects such as cars and pedestrians in 3D plays an indispensable role in autonomous driving. Existing approaches largely rely on expensive LiDAR sensors for accurate depth information. While recently pseudo-LiDAR has been introduced as a promising alternative, at a much lower cost based solely on stereo images, there is still a notable performance gap. In this paper we provide substantial advances to the pseudo-LiDAR framework through improvements in stereo depth estimation. Concretely, we adapt the stereo network architecture and loss function to be more aligned with accurate depth estimation of faraway objects --- currently the primary weakness of pseudo-LiDAR. Further, we explore the idea to leverage cheaper but extremely sparse LiDAR sensors, which alone provide insufficient information for 3D detection, to de-bias our depth estimation. We propose a depth-propagation algorithm, guided by the initial depth estimates, to diffuse these few exact measurements across the entire depth map. We show on the KITTI object detection benchmark that our combined approach yields substantial improvements in depth estimation and stereo-based 3D object detection --- outperforming the previous state-of-the-art detection accuracy for faraway objects by 40%. Our code is available at https://github.com/mileyan/Pseudo_Lidar_V2.

Paper Structure

This paper contains 22 sections, 10 equations, 14 figures, 13 tables.

Figures (14)

  • Figure 1: An illustration of our proposed depth estimation and correction method. The green box is the ground truth location of the car in the KITTI dataset. The red points are obtained with a stereo disparity network. Purple points, obtained with our stereo depth network (SDN), are much closer to the truth. After depth propagation (blue points) with a few (yellow) LiDAR measurements the car is squarely inside the green box. (One floor square is 1m$\times$1m.)
  • Figure 2: The disparity-to-depth transform. We set $f_U=721$ (in pixels) and $b=0.54$ (in meters) in \ref{['eq_disp_depth']}, which are the typical values used in the KITTI dataset.
  • Figure 3: Disparity cost volume (left) vs. depth cost volume (right). The figure shows the 3D points obtained from LiDAR (yellow) and stereo (purple) corresponding to a car in KITTI, seen from the bird's-eye view (BEV). Points from the disparity cost volume are stretched out and noisy; while points from the depth cost volume capture the car contour faithfully.
  • Figure 4: Depth estimation errors. We compare depth estimation error on 3,769 KITTI validation images, taking 64-beam LiDAR depths as ground truths. We separate pixels according to their true depths (z). See the text and appendix for details.
  • Figure 5: The whole pipeline of improved stereo depth estimation: (top) the stereo depth network (SDN) constructs a depth cost volume from left-right images and is optimized for direct depth estimation; (bottom) the graph-based depth correction algorithm (GDC) refines the depth map by leveraging sparser LiDAR signal. The gray arrows indicates the observer's view point. We superimpose the (green) ground-truth 3D box of a car, the same one in \ref{['fig:redpurpleblue']}. The corrected points (blue; bottom right) are perfectly located inside the ground truth box.
  • ...and 9 more figures