Table of Contents
Fetching ...

Mind The Edge: Refining Depth Edges in Sparsely-Supervised Monocular Depth Estimation

Lior Talker, Aviad Cohen, Erez Yosef, Alexandra Dana, Michael Dinerstein

TL;DR

This paper proposes to learn to detect the location of depth edges from densely-supervised synthetic data, and use it to generate supervision for the depth edges in the MDE training, and demonstrates significant gains in the accuracy of the depth edges with comparable per-pixel depth accuracy on several challenging datasets.

Abstract

Monocular Depth Estimation (MDE) is a fundamental problem in computer vision with numerous applications. Recently, LIDAR-supervised methods have achieved remarkable per-pixel depth accuracy in outdoor scenes. However, significant errors are typically found in the proximity of depth discontinuities, i.e., depth edges, which often hinder the performance of depth-dependent applications that are sensitive to such inaccuracies, e.g., novel view synthesis and augmented reality. Since direct supervision for the location of depth edges is typically unavailable in sparse LIDAR-based scenes, encouraging the MDE model to produce correct depth edges is not straightforward. To the best of our knowledge this paper is the first attempt to address the depth edges issue for LIDAR-supervised scenes. In this work we propose to learn to detect the location of depth edges from densely-supervised synthetic data, and use it to generate supervision for the depth edges in the MDE training. To quantitatively evaluate our approach, and due to the lack of depth edges GT in LIDAR-based scenes, we manually annotated subsets of the KITTI and the DDAD datasets with depth edges ground truth. We demonstrate significant gains in the accuracy of the depth edges with comparable per-pixel depth accuracy on several challenging datasets. Code and datasets are available at \url{https://github.com/liortalker/MindTheEdge}.

Mind The Edge: Refining Depth Edges in Sparsely-Supervised Monocular Depth Estimation

TL;DR

This paper proposes to learn to detect the location of depth edges from densely-supervised synthetic data, and use it to generate supervision for the depth edges in the MDE training, and demonstrates significant gains in the accuracy of the depth edges with comparable per-pixel depth accuracy on several challenging datasets.

Abstract

Monocular Depth Estimation (MDE) is a fundamental problem in computer vision with numerous applications. Recently, LIDAR-supervised methods have achieved remarkable per-pixel depth accuracy in outdoor scenes. However, significant errors are typically found in the proximity of depth discontinuities, i.e., depth edges, which often hinder the performance of depth-dependent applications that are sensitive to such inaccuracies, e.g., novel view synthesis and augmented reality. Since direct supervision for the location of depth edges is typically unavailable in sparse LIDAR-based scenes, encouraging the MDE model to produce correct depth edges is not straightforward. To the best of our knowledge this paper is the first attempt to address the depth edges issue for LIDAR-supervised scenes. In this work we propose to learn to detect the location of depth edges from densely-supervised synthetic data, and use it to generate supervision for the depth edges in the MDE training. To quantitatively evaluate our approach, and due to the lack of depth edges GT in LIDAR-based scenes, we manually annotated subsets of the KITTI and the DDAD datasets with depth edges ground truth. We demonstrate significant gains in the accuracy of the depth edges with comparable per-pixel depth accuracy on several challenging datasets. Code and datasets are available at \url{https://github.com/liortalker/MindTheEdge}.
Paper Structure (33 sections, 7 equations, 19 figures, 4 tables)

This paper contains 33 sections, 7 equations, 19 figures, 4 tables.

Figures (19)

  • Figure 1: Refining depth edges with our method when using Packnet-SAN guizilini2021sparse as an MDE baseline. (a) Depth estimation (DDAD dataset). Zoom-in on the crops (on the right) to see the improvement in the 2D localization of the depth edges between the baseline and our method. (b) Augmented reality. An example of virtual objects planted in a scene from the KITTI dataset for AR applications. Zoom-in and inspect the boundaries for the best impression of the depth edges accuracy.
  • Figure 2: Overview of our proposed MDE training method. (A) Training the DEE model on synthetic data. (B) Inferring depth edges on the training set of the real data using the trained DEE model. (C) Training the MDE model on real data with the Edge Loss (EL) using the supervision from the previous step. (D) Inference using the MDE model on the real data. Solid and broken lines represent the dataflow and the GT used in loss functions, respectively.
  • Figure 3: The density of the LIDAR near edges in a partial set of the KITTI dataset (our proposed KITTI-DE dataset). Denote the set of all pixels with a distance $d$ to the closest edge as $P_d$, and the set of pixels, $p$, in $p\in P_d$ with LIDAR measurement as $P_d^L$. (a) The ratio of LIDAR measurements, $|P_d^L|/|P_d|$, out of all pixels in $P_d$, as a function of $d$. (b) An example from the KITTI dataset of a gap in the LIDAR measurments (left of the pole) and an infiltration of LIDAR measurements from the background to the pole (right of the pole). For visualization purposes the LIDAR measurements are dilated.
  • Figure 4: Examples of depth predictions in the KITTI-DE dataset. The depth predictions for the baseline and our method correspond to Packnet-SAN and Packnet-SAN+EL, respectively.
  • Figure 5: Precision and recall of the depth edges on the KITTI-DE and DDAD-DE evaluation sets. Each point on the graphs of the MDE methods is generated with different parameters of the Canny edge detector. Each of the points on the graphs that correspond to the DEE method is generated by thresholding the depth edge probability in the range $(0,1)$.
  • ...and 14 more figures