Table of Contents
Fetching ...

Progressive Depth Decoupling and Modulating for Flexible Depth Completion

Zhiwen Yang, Jiehua Zhang, Liang Li, Chenggang Yan, Yaoqi Sun, Haibing Yin

TL;DR

The paper tackles image-guided depth completion under sparse depth conditions, addressing poor generalization caused by fixed depth priors. It introduces a progressive depth decoupling and modulating framework that uses a Bins Initialization Module to capture scene-wide depth distribution priors from sparse depth, and then progressively refines depth bins and probability representations through bidirectional transformer-based interactions between a depth decoupling branch and a depth modulating branch, guided by multi-scale supervision. The approach achieves competitive or state-of-the-art results on NYU-Depth-v2, KITTI, and ScanNet-v2, while demonstrating robustness across varying depth sampling patterns and offering flexible, multi-scale depth predictions. This method advances depth completion by leveraging distribution priors from sparse data and a coarse-to-fine, interactive design that yields accurate, scalable depth maps suitable for real-world robotics and perception tasks.

Abstract

Image-guided depth completion aims at generating a dense depth map from sparse LiDAR data and RGB image. Recent methods have shown promising performance by reformulating it as a classification problem with two sub-tasks: depth discretization and probability prediction. They divide the depth range into several discrete depth values as depth categories, serving as priors for scene depth distributions. However, previous depth discretization methods are easy to be impacted by depth distribution variations across different scenes, resulting in suboptimal scene depth distribution priors. To address the above problem, we propose a progressive depth decoupling and modulating network, which incrementally decouples the depth range into bins and adaptively generates multi-scale dense depth maps in multiple stages. Specifically, we first design a Bins Initializing Module (BIM) to construct the seed bins by exploring the depth distribution information within a sparse depth map, adapting variations of depth distribution. Then, we devise an incremental depth decoupling branch to progressively refine the depth distribution information from global to local. Meanwhile, an adaptive depth modulating branch is developed to progressively improve the probability representation from coarse-grained to fine-grained. And the bi-directional information interactions are proposed to strengthen the information interaction between those two branches (sub-tasks) for promoting information complementation in each branch. Further, we introduce a multi-scale supervision mechanism to learn the depth distribution information in latent features and enhance the adaptation capability across different scenes. Experimental results on public datasets demonstrate that our method outperforms the state-of-the-art methods. The code will be open-sourced at [this https URL](https://github.com/Cisse-away/PDDM).

Progressive Depth Decoupling and Modulating for Flexible Depth Completion

TL;DR

The paper tackles image-guided depth completion under sparse depth conditions, addressing poor generalization caused by fixed depth priors. It introduces a progressive depth decoupling and modulating framework that uses a Bins Initialization Module to capture scene-wide depth distribution priors from sparse depth, and then progressively refines depth bins and probability representations through bidirectional transformer-based interactions between a depth decoupling branch and a depth modulating branch, guided by multi-scale supervision. The approach achieves competitive or state-of-the-art results on NYU-Depth-v2, KITTI, and ScanNet-v2, while demonstrating robustness across varying depth sampling patterns and offering flexible, multi-scale depth predictions. This method advances depth completion by leveraging distribution priors from sparse data and a coarse-to-fine, interactive design that yields accurate, scalable depth maps suitable for real-world robotics and perception tasks.

Abstract

Image-guided depth completion aims at generating a dense depth map from sparse LiDAR data and RGB image. Recent methods have shown promising performance by reformulating it as a classification problem with two sub-tasks: depth discretization and probability prediction. They divide the depth range into several discrete depth values as depth categories, serving as priors for scene depth distributions. However, previous depth discretization methods are easy to be impacted by depth distribution variations across different scenes, resulting in suboptimal scene depth distribution priors. To address the above problem, we propose a progressive depth decoupling and modulating network, which incrementally decouples the depth range into bins and adaptively generates multi-scale dense depth maps in multiple stages. Specifically, we first design a Bins Initializing Module (BIM) to construct the seed bins by exploring the depth distribution information within a sparse depth map, adapting variations of depth distribution. Then, we devise an incremental depth decoupling branch to progressively refine the depth distribution information from global to local. Meanwhile, an adaptive depth modulating branch is developed to progressively improve the probability representation from coarse-grained to fine-grained. And the bi-directional information interactions are proposed to strengthen the information interaction between those two branches (sub-tasks) for promoting information complementation in each branch. Further, we introduce a multi-scale supervision mechanism to learn the depth distribution information in latent features and enhance the adaptation capability across different scenes. Experimental results on public datasets demonstrate that our method outperforms the state-of-the-art methods. The code will be open-sourced at [this https URL](https://github.com/Cisse-away/PDDM).
Paper Structure (37 sections, 19 equations, 22 figures, 12 tables)

This paper contains 37 sections, 19 equations, 22 figures, 12 tables.

Figures (22)

  • Figure 1: Illustration of our progressive depth decoupling and modulating method. Top: the input of the model and the ground truth. Bottom: an overview of the multi-scale modulated depth maps and decoupled bin partitions. Particularly, the "min” and "max” indicate the depth boundary of the dataset.
  • Figure 2: Overview of the proposed network architecture, which consists of an encoder, a depth modulating branch, and a depth decoupling branch. The input to network is an RGB image and a sparse depth map, and the output of each stage is a dense depth map produced by the result of these two branches with Eq. \ref{['depth predict']}.
  • Figure 3: The comparison of the depth distribution between ground truth and sparse depth map. (a) and (b) show the depth distribution of a single image. (c) and (d) represent the depth distribution in the whole NYU-Depth-v2 dataset. The horizontal axis denotes the depth value, and the vertical axis is the percentage of pixels for each depth value out of the total number of pixels.
  • Figure 4: An overview of our BIM architecture, which takes sparse depth map as input and generates the seed bins.
  • Figure 5: An overview of our transformer block architecture, which receives bin embedding and the projected depth feature as input, and the result is sent to three paths to promote depth modulating, refine bin embedding, and generate bin centers.
  • ...and 17 more figures