Table of Contents
Fetching ...

All-day Depth Completion

Vadim Ezhov, Hyoungseob Park, Zhaoyang Zhang, Rishi Upadhyay, Howard Zhang, Chethan Chinder Chandrappa, Achuta Kadambi, Yunhao Ba, Julie Dorsey, Alex Wong

TL;DR

This work addresses all-day depth estimation by fusing RGB images with synchronized sparse LiDAR depth. It introduces SpaDe, a lightweight network trained on synthetic data to predict a dense depth map $\hat{z}$ and its uncertainty $\hat{\sigma}$ from sparse input $z$, providing a robust depth prior. SpaDe can operate in plug-and-play mode or as part of an uncertainty-driven residual learning (URL) scheme that adaptively fuses SpaDe output with a downstream depth completion model, yielding improved performance under day, night, and all-day conditions. Across nuScenes, Waymo, and synthetic datasets, SpaDe and URL achieve consistent, substantial improvements over baseline methods, demonstrating the practical value of sensor fusion and uncertainty-guided refinement for all-day depth estimation.

Abstract

We propose a method for depth estimation under different illumination conditions, i.e., day and night time. As photometry is uninformative in regions under low-illumination, we tackle the problem through a multi-sensor fusion approach, where we take as input an additional synchronized sparse point cloud (i.e., from a LiDAR) projected onto the image plane as a sparse depth map, along with a camera image. The crux of our method lies in the use of the abundantly available synthetic data to first approximate the 3D scene structure by learning a mapping from sparse to (coarse) dense depth maps along with their predictive uncertainty - we term this, SpaDe. In poorly illuminated regions where photometric intensities do not afford the inference of local shape, the coarse approximation of scene depth serves as a prior; the uncertainty map is then used with the image to guide refinement through an uncertainty-driven residual learning (URL) scheme. The resulting depth completion network leverages complementary strengths from both modalities - depth is sparse but insensitive to illumination and in metric scale, and image is dense but sensitive with scale ambiguity. SpaDe can be used in a plug-and-play fashion, which allows for 25% improvement when augmented onto existing methods to preprocess sparse depth. We demonstrate URL on the nuScenes dataset where we improve over all baselines by an average 11.65% in all-day scenarios, 11.23% when tested specifically for daytime, and 13.12% for nighttime scenes.

All-day Depth Completion

TL;DR

This work addresses all-day depth estimation by fusing RGB images with synchronized sparse LiDAR depth. It introduces SpaDe, a lightweight network trained on synthetic data to predict a dense depth map and its uncertainty from sparse input , providing a robust depth prior. SpaDe can operate in plug-and-play mode or as part of an uncertainty-driven residual learning (URL) scheme that adaptively fuses SpaDe output with a downstream depth completion model, yielding improved performance under day, night, and all-day conditions. Across nuScenes, Waymo, and synthetic datasets, SpaDe and URL achieve consistent, substantial improvements over baseline methods, demonstrating the practical value of sensor fusion and uncertainty-guided refinement for all-day depth estimation.

Abstract

We propose a method for depth estimation under different illumination conditions, i.e., day and night time. As photometry is uninformative in regions under low-illumination, we tackle the problem through a multi-sensor fusion approach, where we take as input an additional synchronized sparse point cloud (i.e., from a LiDAR) projected onto the image plane as a sparse depth map, along with a camera image. The crux of our method lies in the use of the abundantly available synthetic data to first approximate the 3D scene structure by learning a mapping from sparse to (coarse) dense depth maps along with their predictive uncertainty - we term this, SpaDe. In poorly illuminated regions where photometric intensities do not afford the inference of local shape, the coarse approximation of scene depth serves as a prior; the uncertainty map is then used with the image to guide refinement through an uncertainty-driven residual learning (URL) scheme. The resulting depth completion network leverages complementary strengths from both modalities - depth is sparse but insensitive to illumination and in metric scale, and image is dense but sensitive with scale ambiguity. SpaDe can be used in a plug-and-play fashion, which allows for 25% improvement when augmented onto existing methods to preprocess sparse depth. We demonstrate URL on the nuScenes dataset where we improve over all baselines by an average 11.65% in all-day scenarios, 11.23% when tested specifically for daytime, and 13.12% for nighttime scenes.
Paper Structure (10 sections, 7 equations, 4 figures, 2 tables)

This paper contains 10 sections, 7 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: An Illustration of Uncertainty-driven Residual Learning. (a) Our Sparse-to-Dense module (SpaDe) is trained on synthetic data to approximate dense depth from sparse points. (b) SpaDe is used as an inductive bias in Uncertainty-driven Residual Learning (URL), and the depth estimation model is trained to adaptively refine the approximated depth based on the estimated log uncertainty. SpaDe can also be used in a plug-and-play manner to enable all-day depth estimation for a pretrained depth estimator without training.
  • Figure 2: Sparse-to-Dense module (SpaDe) on real dataset. The boxes highlight the alignment between SpaDe's predictive uncertainty and depth discontinuity regions that often erroneous. The uncertainty aligns well with error when compared to ground truth.
  • Figure 3: Representative results of all-day depth estimation on nuScenes day and night images. The region for detailed comparisons are highlighted by boxes. URL performs better on (a) low-illumination conditions, (b) missing sparse points and (c) depth discontinuity regions.
  • Figure 4: Ablation Study on nuScenes Predictions from SpaDe serve as strong inductive bias for downstream depth completion models. Augmenting KITTI pretrained models with SpaDe improves estimates in regions where photometry is uninformative (highlighted).