Robust Depth Super-Resolution via Adaptive Diffusion Sampling

Kun Wang; Yun Zhu; Pan Zhou; Na Zhao

Robust Depth Super-Resolution via Adaptive Diffusion Sampling

Kun Wang, Yun Zhu, Pan Zhou, Na Zhao

TL;DR

AdaDS tackles the robustness challenge in depth super-resolution by exploiting the contraction property of forward diffusion, guiding degraded inputs toward a high-quality depth prior. It introduces a two-stage framework: an uncertainty-aware calibration that refines the LR depth and an adaptive diffusion sampling step that injects timesteps-aligned noise to land intermediate latents within the high-probability region of a pre-trained diffusion model, enabling the diffusion prior to dominate recovery. Through extensive real-world and synthetic experiments, AdaDS demonstrates superior zero-shot generalization and resilience to diverse degradations and upsampling factors, outperforming state-of-the-art methods. The approach offers a practical, robust pathway for high-quality depth reconstruction in real-world sensing pipelines, with potential extensions to broader depth-related tasks.

Abstract

We propose AdaDS, a generalizable framework for depth super-resolution that robustly recovers high-resolution depth maps from arbitrarily degraded low-resolution inputs. Unlike conventional approaches that directly regress depth values and often exhibit artifacts under severe or unknown degradation, AdaDS capitalizes on the contraction property of Gaussian smoothing: as noise accumulates in the forward process, distributional discrepancies between degraded inputs and their pristine high-quality counterparts diminish, ultimately converging to isotropic Gaussian prior. Leveraging this, AdaDS adaptively selects a starting timestep in the reverse diffusion trajectory based on estimated refinement uncertainty, and subsequently injects tailored noise to position the intermediate sample within the high-probability region of the target posterior distribution. This strategy ensures inherent robustness, enabling generative prior of a pre-trained diffusion model to dominate recovery even when upstream estimations are imperfect. Extensive experiments on real-world and synthetic benchmarks demonstrate AdaDS's superior zero-shot generalization and resilience to diverse degradation patterns compared to state-of-the-art methods.

Robust Depth Super-Resolution via Adaptive Diffusion Sampling

TL;DR

Abstract

Paper Structure (13 sections, 1 theorem, 16 equations, 12 figures, 3 tables)

This paper contains 13 sections, 1 theorem, 16 equations, 12 figures, 3 tables.

Introduction
Related Work
Depth Diffusion Model Preliminary
AdaDS Framework
Implementation Details
Experiment
Experiment Setup
Comparison with the State-of-the-Art
Ablation Study
Conclusion
Additional Experimental Detail
Additional Experimental Result
Limitation and Future Work

Key Result

Proposition 4.1

For $\bar{\alpha}_t \in (0,1]$, the function $\mathcal{H}(\bar{\alpha}_t)$ admits a unique global maximum.

Figures (12)

Figure 1: We introduce AdaDS, a framework designed to address the persistent generalization challenges in depth super-resolution problem. Given an arbitrarily degraded low-resolution depth map and its corresponding RGB image, AdaDS employs zero-shot inference to reconstruct a high-resolution metric depth map with fine-grained structural details.
Figure 2: The distribution of the degraded input (with predicted mean and variance from coarse refinement) progressively aligns with the ground-truth distribution due to Gaussian smoothing contraction. Dots indicate the means of the 2D Gaussians.
Figure 3: Overall framework of AdaDS. (a) The first stage produces a refined depth latent along with an estimated uncertainty map reflecting refinement reliability. (b) The second stage blends the coarse refinement with estimated noise, aligning it with the high-probability region of the target high-quality depth posterior. The produced noise is denoised to recover the final depth map via a pre-trained depth diffusion model. $\mathcal{E}_p$, $\mathcal{E}_l$ and $\mathcal{D}_l$ denote the patch embedding, latent encoder and latent decoder, respectively.
Figure 4: As $t$ increases, the reconstructed depth maps exhibit enhanced fidelity and sharper structural details. However, this is accompanied by an increased deviation from the input depth measurements. All depth maps are visualized using a unified color palette, with RMSE metric reported in centimeters.
Figure 5: Comparison on real-world benchmarks with arbitrary upsampling factors. The $\delta_{1.05}$$\uparrow$ metric is reported in this figure.
...and 7 more figures

Theorems & Definitions (2)

Proposition 4.1
proof

Robust Depth Super-Resolution via Adaptive Diffusion Sampling

TL;DR

Abstract

Robust Depth Super-Resolution via Adaptive Diffusion Sampling

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (2)