Table of Contents
Fetching ...

Any to Full: Prompting Depth Anything for Depth Completion in One Stage

Zhiyuan Zhou, Ruofeng Liu, Taichi Liu, Weijian Zuo, Shanshan Wang, Zhiqing Hong, Desheng Zhang

TL;DR

This work presents Any2Full, a one-stage, domain-general, and pattern-agnostic framework that reformulates completion as a scale-prompting adaptation of a pretrained MDE model, and designs a Scale-Aware Prompt Encoder to address varying depth sparsity levels and irregular spatial distributions.

Abstract

Accurate, dense depth estimation is crucial for robotic perception, but commodity sensors often yield sparse or incomplete measurements due to hardware limitations. Existing RGBD-fused depth completion methods learn priors jointly conditioned on training RGB distribution and specific depth patterns, limiting domain generalization and robustness to various depth patterns. Recent efforts leverage monocular depth estimation (MDE) models to introduce domain-general geometric priors, but current two-stage integration strategies relying on explicit relative-to-metric alignment incur additional computation and introduce structured distortions. To this end, we present Any2Full, a one-stage, domain-general, and pattern-agnostic framework that reformulates completion as a scale-prompting adaptation of a pretrained MDE model. To address varying depth sparsity levels and irregular spatial distributions, we design a Scale-Aware Prompt Encoder. It distills scale cues from sparse inputs into unified scale prompts, guiding the MDE model toward globally scale-consistent predictions while preserving its geometric priors. Extensive experiments demonstrate that Any2Full achieves superior robustness and efficiency. It outperforms OMNI-DC by 32.2\% in average AbsREL and delivers a 1.4$\times$ speedup over PriorDA with the same MDE backbone, establishing a new paradigm for universal depth completion. Codes and checkpoints are available at https://github.com/zhiyuandaily/Any2Full.

Any to Full: Prompting Depth Anything for Depth Completion in One Stage

TL;DR

This work presents Any2Full, a one-stage, domain-general, and pattern-agnostic framework that reformulates completion as a scale-prompting adaptation of a pretrained MDE model, and designs a Scale-Aware Prompt Encoder to address varying depth sparsity levels and irregular spatial distributions.

Abstract

Accurate, dense depth estimation is crucial for robotic perception, but commodity sensors often yield sparse or incomplete measurements due to hardware limitations. Existing RGBD-fused depth completion methods learn priors jointly conditioned on training RGB distribution and specific depth patterns, limiting domain generalization and robustness to various depth patterns. Recent efforts leverage monocular depth estimation (MDE) models to introduce domain-general geometric priors, but current two-stage integration strategies relying on explicit relative-to-metric alignment incur additional computation and introduce structured distortions. To this end, we present Any2Full, a one-stage, domain-general, and pattern-agnostic framework that reformulates completion as a scale-prompting adaptation of a pretrained MDE model. To address varying depth sparsity levels and irregular spatial distributions, we design a Scale-Aware Prompt Encoder. It distills scale cues from sparse inputs into unified scale prompts, guiding the MDE model toward globally scale-consistent predictions while preserving its geometric priors. Extensive experiments demonstrate that Any2Full achieves superior robustness and efficiency. It outperforms OMNI-DC by 32.2\% in average AbsREL and delivers a 1.4 speedup over PriorDA with the same MDE backbone, establishing a new paradigm for universal depth completion. Codes and checkpoints are available at https://github.com/zhiyuandaily/Any2Full.
Paper Structure (27 sections, 12 equations, 14 figures, 7 tables)

This paper contains 27 sections, 12 equations, 14 figures, 7 tables.

Figures (14)

  • Figure 1: Depth completion recovers dense depth maps from raw measurements and RGB guidance. (a) Traditional two-stage methods predict coarse depth to bridge the gap between sparse input and dense output. (b) Recent approaches integrate monocular depth estimation (MDE) to generate relative depth and explicitly align it with sparse depth, disrupting MDE’s geometric priors. (c) Our framework employs a lightweight prompting mechanism that tightly integrates MDE’s geometric priors, achieving domain-general and pattern-agnostic depth completion in one stage.
  • Figure 2: Overview of the Any2Full. (a) Our framework reformulates depth completion as a scale-prompting adaptation of a pretrained MDE model. The normalized raw depth is encoded into scale prompts to modulate MDE features for scale-consistent relative depth prediction, followed by a non-parametric least-squares fit to recover the dense metric depth. (b) The Scale-aware Prompt Encoder (SAPE) transforms raw depth into unified scale prompts through two hierarchical modules: Local Enrichment anchors scale cues into the MDE latent space; and Global Propagation leverages MDE geometric features as guidance to diffuse scale cues across patches. Finally, Scale Prompt Fusion injects the resulting prompts into the MDE decoder for final prediction.
  • Figure 3: Qualitative comparison under various depth sampling patterns. Black shows missing depths, and red boxes mark key areas. Any2Full demonstrates accurate global geometry, structural consistency, and fine-grained detail preservation across all patterns.
  • Figure 4: Robustness comparison on varying depth range. Evaluation is conducted using AbsREL (↓) on NYU-Depth V2 and KITTI DC.
  • Figure 5: Visualization of scale inconsistency. (a) The backbone MDE (Depth Anything) shows varying regional scale factors when aligned to metric depth (red = larger, blue = smaller), while our prediction yields nearly uniform scales consistent with the ground truth. (b) The bottom row shows the region-partitioned RGB input, sparse depth, and our final result, illustrating how region-wise scales are computed.
  • ...and 9 more figures