Table of Contents
Fetching ...

Depth Prompting for Sensor-Agnostic Depth Estimation

Jin-Hwi Park, Chanhwi Jeong, Junoh Lee, Hae-Gon Jeon

TL;DR

This paper tackles the problem of depth estimation across diverse sensors, where biases from density, pattern, and scan range limit generalization. It introduces a depth-prompting module that fuses a depth prompt with image features to form an adaptive affinity for spatial propagation, leveraging a foundation model to produce metric-scale depths. By applying bias-tuning to freeze most backbone parameters and a least-squares refinement to achieve absolute scale, the method attains robust, sensor-agnostic depth maps with strong zero-shot generalization. Extensive experiments on NYU and KITTI-DC, plus zero-shot tests on commercial sensors, demonstrate improved stability across varying sparsity, patterns, and ranges compared to state-of-the-art sparse-depth methods.

Abstract

Dense depth maps have been used as a key element of visual perception tasks. There have been tremendous efforts to enhance the depth quality, ranging from optimization-based to learning-based methods. Despite the remarkable progress for a long time, their applicability in the real world is limited due to systematic measurement biases such as density, sensing pattern, and scan range. It is well-known that the biases make it difficult for these methods to achieve their generalization. We observe that learning a joint representation for input modalities (e.g., images and depth), which most recent methods adopt, is sensitive to the biases. In this work, we disentangle those modalities to mitigate the biases with prompt engineering. For this, we design a novel depth prompt module to allow the desirable feature representation according to new depth distributions from either sensor types or scene configurations. Our depth prompt can be embedded into foundation models for monocular depth estimation. Through this embedding process, our method helps the pretrained model to be free from restraint of depth scan range and to provide absolute scale depth maps. We demonstrate the effectiveness of our method through extensive evaluations. Source code is publicly available at https://github.com/JinhwiPark/DepthPrompting .

Depth Prompting for Sensor-Agnostic Depth Estimation

TL;DR

This paper tackles the problem of depth estimation across diverse sensors, where biases from density, pattern, and scan range limit generalization. It introduces a depth-prompting module that fuses a depth prompt with image features to form an adaptive affinity for spatial propagation, leveraging a foundation model to produce metric-scale depths. By applying bias-tuning to freeze most backbone parameters and a least-squares refinement to achieve absolute scale, the method attains robust, sensor-agnostic depth maps with strong zero-shot generalization. Extensive experiments on NYU and KITTI-DC, plus zero-shot tests on commercial sensors, demonstrate improved stability across varying sparsity, patterns, and ranges compared to state-of-the-art sparse-depth methods.

Abstract

Dense depth maps have been used as a key element of visual perception tasks. There have been tremendous efforts to enhance the depth quality, ranging from optimization-based to learning-based methods. Despite the remarkable progress for a long time, their applicability in the real world is limited due to systematic measurement biases such as density, sensing pattern, and scan range. It is well-known that the biases make it difficult for these methods to achieve their generalization. We observe that learning a joint representation for input modalities (e.g., images and depth), which most recent methods adopt, is sensitive to the biases. In this work, we disentangle those modalities to mitigate the biases with prompt engineering. For this, we design a novel depth prompt module to allow the desirable feature representation according to new depth distributions from either sensor types or scene configurations. Our depth prompt can be embedded into foundation models for monocular depth estimation. Through this embedding process, our method helps the pretrained model to be free from restraint of depth scan range and to provide absolute scale depth maps. We demonstrate the effectiveness of our method through extensive evaluations. Source code is publicly available at https://github.com/JinhwiPark/DepthPrompting .
Paper Structure (16 sections, 8 equations, 4 figures, 7 tables)

This paper contains 16 sections, 8 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: An overview of our depth prompting for sensor-agnostic depth estimation. Leveraging a foundation model for monocular depth estimation, our framework produces a high-fidelity depth map in metric scale and provides impressive zero/few-shot generality. C.Former indicates CompletionFormer zhang2023completionformer. More details and examples are reported in \ref{['subsec: Zero-shot Inference on Commercial Sensors']} and supplementary materials.
  • Figure 2: Examples of sensor biases. Depth estimation with an active sensor suffers from bias problems, including fixed density and pattern, and inherent scan range of sensors used.
  • Figure 3: An overview of the proposed architecture. We design a depth prompt module to construct an adaptive affinity map $A_{ada}$, which guides the propagation of given depth information.
  • Figure 4: Qualitative results for the changes of measurement patterns, sparsity, and scan ranges. We visualize images, input examples in the training phase, sparse depths in the test phase, and GT in the first row.