Scene Completeness-Aware Lidar Depth Completion for Driving Scenario
Cho-Ying Wu, Ulrich Neumann
TL;DR
SCADC addresses the upper-scene depth gap in lidar depth completion by fusing stereo disparity with lidar depth. The method uses Attentional Point Confidence to weight each modality and a three-stage stacked hourglass regressor, with fused depth $D_f = D_{stereo} × M_{stereo} + D_{lidar} × M_{lidar}$ and $M_{stereo} = 1 - M_{lidar}$, trained by the total loss $L = L_1 + L_2 + L_3 + L_c$ where $L_c = ||M_{lidar}-M_g||^2_2$. It achieves improved upper-scene reconstruction while preserving lower-scene precision on KITTI Depth Completion, outperforming baselines, and it also boosts outdoor RGB-D semantic segmentation performance when depth is integrated with SSMA. This approach demonstrates practical impact for driving scenarios with large objects extending into upper image regions and supports better downstream scene understanding.
Abstract
This paper introduces Scene Completeness-Aware Depth Completion (SCADC) to complete raw lidar scans into dense depth maps with fine and complete scene structures. Recent sparse depth completion for lidars only focuses on the lower scenes and produces irregular estimations on the upper because existing datasets, such as KITTI, do not provide groundtruth for upper areas. These areas are considered less important since they are usually sky or trees of less scene understanding interest. However, we argue that in several driving scenarios such as large trucks or cars with loads, objects could extend to the upper parts of scenes. Thus depth maps with structured upper scene estimation are important for RGBD algorithms. SCADC adopts stereo images that produce disparities with better scene completeness but are generally less precise than lidars, to help sparse lidar depth completion. To our knowledge, we are the first to focus on scene completeness of sparse depth completion. We validate our SCADC on both depth estimate precision and scene-completeness on KITTI. Moreover, we experiment on less-explored outdoor RGBD semantic segmentation with scene completeness-aware D-input to validate our method.
