Table of Contents
Fetching ...

Need for Speed: Zero-Shot Depth Completion with Single-Step Diffusion

Jakub Gregorek, Paraskevas Pegios, Nando Metzger, Konrad Schindler, Theodora Kontogianni, Lazaros Nalpantidis

Abstract

We introduce Marigold-SSD, a single-step, late-fusion depth completion framework that leverages strong diffusion priors while eliminating the costly test-time optimization typically associated with diffusion-based methods. By shifting computational burden from inference to finetuning, our approach enables efficient and robust 3D perception under real-world latency constraints. Marigold-SSD achieves significantly faster inference with a training cost of only 4.5 GPU days. We evaluate our method across four indoor and two outdoor benchmarks, demonstrating strong cross-domain generalization and zero-shot performance compared to existing depth completion approaches. Our approach significantly narrows the efficiency gap between diffusion-based and discriminative models. Finally, we challenge common evaluation protocols by analyzing performance under varying input sparsity levels. Page: https://dtu-pas.github.io/marigold-ssd/

Need for Speed: Zero-Shot Depth Completion with Single-Step Diffusion

Abstract

We introduce Marigold-SSD, a single-step, late-fusion depth completion framework that leverages strong diffusion priors while eliminating the costly test-time optimization typically associated with diffusion-based methods. By shifting computational burden from inference to finetuning, our approach enables efficient and robust 3D perception under real-world latency constraints. Marigold-SSD achieves significantly faster inference with a training cost of only 4.5 GPU days. We evaluate our method across four indoor and two outdoor benchmarks, demonstrating strong cross-domain generalization and zero-shot performance compared to existing depth completion approaches. Our approach significantly narrows the efficiency gap between diffusion-based and discriminative models. Finally, we challenge common evaluation protocols by analyzing performance under varying input sparsity levels. Page: https://dtu-pas.github.io/marigold-ssd/
Paper Structure (13 sections, 2 equations, 8 figures, 4 tables)

This paper contains 13 sections, 2 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Performance vs speed trade-off. Comparison of our method Marigold-SSD with other diffusion-based approaches Marigold-DC marigold-dc and Marigold-E2E marigold-e2e + LS (w/o sparse condition) as well as discriminative baselines nlspncompletion-former on KITTI dataset kitti-dataset. Marigold-SSD occupies a unique region in the trade-off space closing the efficiency gap to discriminative methods while retaining the benefit of the strong diffusion prior.
  • Figure 2: Marigold-SSD for zero-shot depth completion. We present a single-step diffusion framework with end-to-end fine-tuning as an efficient alternative to the test-time optimization approach of Marigold-DC marigold-dc. To this end, we introduce a conditional decoder with late fusion to incorporate sparse depth measurements. At inference, our method Marigold-SSD produces high-quality results in a single step, while Marigold-DC typically requires 50 optimization steps per inference and often ensembling 10 inferences for further improvements.
  • Figure 3: Internal architecture of the conditional decoder.$\mathcal{D}_{\mathbf{C}}$ consists of the VAE decoder $\mathcal{D}$ (top row) and blocks processing the sparse condition $\mathbf{C}$ (bottom row), adapted from the VAE encoder $\mathcal{E}$ (differing in down-sampling positions). Feature maps are concatenated channel‑wise ($\oplus$) at five levels and the fusion blocks use $1\times1$ convolutions (Eq. \ref{['eq:fusion']}). Conv denotes standard convolution layers, UP, DOWN, and MID blocks are ResNet resnet-based, and MID blocks additionally containing an attention layer.
  • Figure 4: Qualitative results. Marigold-SSD generally produces smoother depth maps than Marigold-DC marigold-dc, which tends to over-refine details that can lead to unrealistic scene structures.
  • Figure 5: Qualitative results. Both Marigold-SSD and Marigold-DC tend to underestimate sky depth on KITTI and DDAD, consistent with prior Marigold limitations and limited conditioning information in the sky, while they differ in how they estimate fine scene details.
  • ...and 3 more figures