Table of Contents
Fetching ...

SteeredMarigold: Steering Diffusion Towards Depth Completion of Largely Incomplete Depth Maps

Jakub Gregorek, Lazaros Nalpantidis

TL;DR

SteeredMarigold tackles depth completion under extreme depth sparsity by conditioning a pre-trained diffusion-based monocular depth estimator (Marigold) with sparse, metric depth points. It performs diffusion in a latent VAE space and introduces a plug-and-play steering mechanism that nudges the diffusion process toward known depth constraints through linear interpolations and a steering factor $\lambda$, without any training. The approach yields metrically consistent depth in largely incomplete maps, achieving state-of-the-art results on NYUv2 in such scenarios and exhibiting robustness to missing-depth regions. While highly effective in incomplete-sparsity settings, it trades off real-time performance for accuracy and remains to be validated across more datasets and modalities.

Abstract

Even if the depth maps captured by RGB-D sensors deployed in real environments are often characterized by large areas missing valid depth measurements, the vast majority of depth completion methods still assumes depth values covering all areas of the scene. To address this limitation, we introduce SteeredMarigold, a training-free, zero-shot depth completion method capable of producing metric dense depth, even for largely incomplete depth maps. SteeredMarigold achieves this by using the available sparse depth points as conditions to steer a denoising diffusion probabilistic model. Our method outperforms relevant top-performing methods on the NYUv2 dataset, in tests where no depth was provided for a large area, achieving state-of-art performance and exhibiting remarkable robustness against depth map incompleteness. Our source code is publicly available at https://steeredmarigold.github.io.

SteeredMarigold: Steering Diffusion Towards Depth Completion of Largely Incomplete Depth Maps

TL;DR

SteeredMarigold tackles depth completion under extreme depth sparsity by conditioning a pre-trained diffusion-based monocular depth estimator (Marigold) with sparse, metric depth points. It performs diffusion in a latent VAE space and introduces a plug-and-play steering mechanism that nudges the diffusion process toward known depth constraints through linear interpolations and a steering factor , without any training. The approach yields metrically consistent depth in largely incomplete maps, achieving state-of-the-art results on NYUv2 in such scenarios and exhibiting robustness to missing-depth regions. While highly effective in incomplete-sparsity settings, it trades off real-time performance for accuracy and remains to be validated across more datasets and modalities.

Abstract

Even if the depth maps captured by RGB-D sensors deployed in real environments are often characterized by large areas missing valid depth measurements, the vast majority of depth completion methods still assumes depth values covering all areas of the scene. To address this limitation, we introduce SteeredMarigold, a training-free, zero-shot depth completion method capable of producing metric dense depth, even for largely incomplete depth maps. SteeredMarigold achieves this by using the available sparse depth points as conditions to steer a denoising diffusion probabilistic model. Our method outperforms relevant top-performing methods on the NYUv2 dataset, in tests where no depth was provided for a large area, achieving state-of-art performance and exhibiting remarkable robustness against depth map incompleteness. Our source code is publicly available at https://steeredmarigold.github.io.
Paper Structure (10 sections, 11 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 10 sections, 11 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: RGB-D sensors often fail to provide robots with depth measurements in large areas due to large distances (case on the left) or lighting/material properties (case on the right). The task of completing such largely incomplete depth maps is much more challenging than the typical scenarios considered in the depth completion literature.
  • Figure 2: SteeredMarigold architecture. Our plug-and-play steering module expands the Marigold diffusion model to perform depth completion. Note that there is only one instance of the encoder $\mathcal{E}$ and decoder $\mathcal{D}$.
  • Figure 3: We can observe how the diffusion process is able to harmonize the depth estimate with the direction of steering. Depth points sampled from ground-truth (a) using the mask (b) are used to steer the diffusion process in the direction determined by modifying $\tilde{\mathbf{x}}^{\mathcal{D}}_{0}$ using $\phi_1$ and $\phi_2$. The effects of steering are very apparent in the initial steps (c) of the diffusion reverse process and progressively becomes less visible in the latter steps (d) as the diffusion process progressively harmonizes regions not affected by steering with the regions that are being impacted by the steering. The steering direction in (c) and (d) corresponds to $\tilde{\mathbf{x}}^{\mathcal{D}}_{0}$ after subtracting $\phi_1$ and adding $\phi_2$. The visualization does not take the steering factor $\lambda$ into account.
  • Figure 4: The three considered evaluation areas: large area ($608\times448$ - entire image), medium area ($408\times248$ - equal to removed depth area) and small area ($358\times198$).
  • Figure 5: Visualization of a completed scene by BP-Net (a), CompletionFormer (b) and our method (c). In the top row, the models were provided with depth samples covering the entire scene. In the second row, no depth samples were provided in the central area of $408\times248$. We can observe, that BP-Net and CompletionFormer struggle to complete the scene without any depth values in that region.