Zero-shot Depth Completion via Test-time Alignment with Affine-invariant Depth Prior
Lee Hyoseok, Kyeong Seon Kim, Kwon Byung-Ki, Tae-Hyun Oh
TL;DR
The paper tackles the problem of domain-robust depth completion from sparse depth by leveraging affine-invariant depth priors learned by pre-trained monocular diffusion models. It introduces a zero-shot method that performs test-time alignment to fuse the diffusion-based depth prior with metric sparse measurements, enforcing hard data-consistency through an optimization loop and correcting the diffusion latent with scheduled noise. A novel prior-based outlier filtering and a loss suite including sparse-depth consistency, edge-aware smoothness, and Relative Structure Similarity (R-SSIM) help preserve scene structure and depth affinity across domains. Empirically, the approach demonstrates strong cross-domain generalization on indoor and outdoor datasets, surpasses several depth-prior-based and unsupervised baselines, and highlights the practical potential of foundation-model priors for robust depth completion without domain-specific training.
Abstract
Depth completion, predicting dense depth maps from sparse depth measurements, is an ill-posed problem requiring prior knowledge. Recent methods adopt learning-based approaches to implicitly capture priors, but the priors primarily fit in-domain data and do not generalize well to out-of-domain scenarios. To address this, we propose a zero-shot depth completion method composed of an affine-invariant depth diffusion model and test-time alignment. We use pre-trained depth diffusion models as depth prior knowledge, which implicitly understand how to fill in depth for scenes. Our approach aligns the affine-invariant depth prior with metric-scale sparse measurements, enforcing them as hard constraints via an optimization loop at test-time. Our zero-shot depth completion method demonstrates generalization across various domain datasets, achieving up to a 21\% average performance improvement over the previous state-of-the-art methods while enhancing spatial understanding by sharpening scene details. We demonstrate that aligning a monocular affine-invariant depth prior with sparse metric measurements is a proven strategy to achieve domain-generalizable depth completion without relying on extensive training data. Project page: https://hyoseok1223.github.io/zero-shot-depth-completion/.
