Table of Contents
Fetching ...

DCDepth: Progressive Monocular Depth Estimation in Discrete Cosine Domain

Kun Wang, Zhiqiang Yan, Junkai Fan, Wanlu Zhu, Xiang Li, Jun Li, Jian Yang

TL;DR

DCDepth, a novel framework for the long-standing monocular depth estimation task, estimates the frequency coefficients of depth patches after transforming them into the discrete cosine domain, which allows for the modeling of local depth correlations within each patch.

Abstract

In this paper, we introduce DCDepth, a novel framework for the long-standing monocular depth estimation task. Moving beyond conventional pixel-wise depth estimation in the spatial domain, our approach estimates the frequency coefficients of depth patches after transforming them into the discrete cosine domain. This unique formulation allows for the modeling of local depth correlations within each patch. Crucially, the frequency transformation segregates the depth information into various frequency components, with low-frequency components encapsulating the core scene structure and high-frequency components detailing the finer aspects. This decomposition forms the basis of our progressive strategy, which begins with the prediction of low-frequency components to establish a global scene context, followed by successive refinement of local details through the prediction of higher-frequency components. We conduct comprehensive experiments on NYU-Depth-V2, TOFDC, and KITTI datasets, and demonstrate the state-of-the-art performance of DCDepth. Code is available at https://github.com/w2kun/DCDepth.

DCDepth: Progressive Monocular Depth Estimation in Discrete Cosine Domain

TL;DR

DCDepth, a novel framework for the long-standing monocular depth estimation task, estimates the frequency coefficients of depth patches after transforming them into the discrete cosine domain, which allows for the modeling of local depth correlations within each patch.

Abstract

In this paper, we introduce DCDepth, a novel framework for the long-standing monocular depth estimation task. Moving beyond conventional pixel-wise depth estimation in the spatial domain, our approach estimates the frequency coefficients of depth patches after transforming them into the discrete cosine domain. This unique formulation allows for the modeling of local depth correlations within each patch. Crucially, the frequency transformation segregates the depth information into various frequency components, with low-frequency components encapsulating the core scene structure and high-frequency components detailing the finer aspects. This decomposition forms the basis of our progressive strategy, which begins with the prediction of low-frequency components to establish a global scene context, followed by successive refinement of local details through the prediction of higher-frequency components. We conduct comprehensive experiments on NYU-Depth-V2, TOFDC, and KITTI datasets, and demonstrate the state-of-the-art performance of DCDepth. Code is available at https://github.com/w2kun/DCDepth.

Paper Structure

This paper contains 27 sections, 10 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Progressive estimation scheme. For input image with size $H\times W$, DCDepth estimates the DCT coefficients for each $S\times S$ depth patches. The prediction follows a global-to-local strategy, starting with the initial estimation of lower-frequency components to capture the global scene structure. Subsequently, higher-frequency components are estimated to enhance the local details, while the lower-frequency estimates are refined. The estimation is carried out at $\frac{H}{S}\times \frac{W}{S}$ resolution, and spatial-domain estimation is achieved through inverse DCT.
  • Figure 2: Evolution of intermediate depth estimations. We report several intermediate depth estimation results to illustrate our progressive estimation scheme.
  • Figure 3: DCDepth framework overview. The DCT-based downsampling strategy is shown at the bottom-left corner, where $R$ and $r$ denote for downsampling factor and channel reduction rate, respectively. The central section details the iterative process of PPH, with $N$ indicating the number of iterative steps. The frequency encoder utilized by PPH is illustrated at the right box.
  • Figure 4: Qualitative depth comparison on the NYU-Depth-V2 dataset. The white boxes highlight the regions where our method achieves more accurate predictions.
  • Figure 5: Qualitative depth comparison on the TOFDC dataset.
  • ...and 1 more figures