Table of Contents
Fetching ...

Masked Spatial Propagation Network for Sparsity-Adaptive Depth Refinement

Jinyoung Jun, Jae-Han Lee, Chang-Su Kim

TL;DR

This work tackles depth completion under varying sparsity by introducing sparsity-adaptive depth refinement (SDR) and a masked spatial propagation network (MSPN). The SDR framework combines an off-the-shelf monocular depth estimator, a guidance network, and MSPN to iteratively refine depth maps while updating a propagation mask, enabling robust performance when the number of sparse depth points changes. MSPN uses a masked attention mechanism over pixel neighborhoods and adaptive iteration counts to propagate information from sparse measurements to the full image, achieving state-of-the-art results on both SDR and traditional depth completion benchmarks (NYUv2 and KITTI). The approach generalizes across different monocular estimators and datasets, and provides practical robustness for real-world depth sensing where sparsity is often variable and unpredictable.

Abstract

The main function of depth completion is to compensate for an insufficient and unpredictable number of sparse depth measurements of hardware sensors. However, existing research on depth completion assumes that the sparsity -- the number of points or LiDAR lines -- is fixed for training and testing. Hence, the completion performance drops severely when the number of sparse depths changes significantly. To address this issue, we propose the sparsity-adaptive depth refinement (SDR) framework, which refines monocular depth estimates using sparse depth points. For SDR, we propose the masked spatial propagation network (MSPN) to perform SDR with a varying number of sparse depths effectively by gradually propagating sparse depth information throughout the entire depth map. Experimental results demonstrate that MPSN achieves state-of-the-art performance on both SDR and conventional depth completion scenarios.

Masked Spatial Propagation Network for Sparsity-Adaptive Depth Refinement

TL;DR

This work tackles depth completion under varying sparsity by introducing sparsity-adaptive depth refinement (SDR) and a masked spatial propagation network (MSPN). The SDR framework combines an off-the-shelf monocular depth estimator, a guidance network, and MSPN to iteratively refine depth maps while updating a propagation mask, enabling robust performance when the number of sparse depth points changes. MSPN uses a masked attention mechanism over pixel neighborhoods and adaptive iteration counts to propagate information from sparse measurements to the full image, achieving state-of-the-art results on both SDR and traditional depth completion benchmarks (NYUv2 and KITTI). The approach generalizes across different monocular estimators and datasets, and provides practical robustness for real-world depth sensing where sparsity is often variable and unpredictable.

Abstract

The main function of depth completion is to compensate for an insufficient and unpredictable number of sparse depth measurements of hardware sensors. However, existing research on depth completion assumes that the sparsity -- the number of points or LiDAR lines -- is fixed for training and testing. Hence, the completion performance drops severely when the number of sparse depths changes significantly. To address this issue, we propose the sparsity-adaptive depth refinement (SDR) framework, which refines monocular depth estimates using sparse depth points. For SDR, we propose the masked spatial propagation network (MSPN) to perform SDR with a varying number of sparse depths effectively by gradually propagating sparse depth information throughout the entire depth map. Experimental results demonstrate that MPSN achieves state-of-the-art performance on both SDR and conventional depth completion scenarios.
Paper Structure (21 sections, 15 equations, 11 figures, 4 tables)

This paper contains 21 sections, 15 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Illustration of the masked spatial propagation process in the proposed MSPN. The initial mask is obtained from sparse depths, assigned 1 if depth values are present and 0 otherwise. For easier comparison, error maps are also provided for each depth map, in which brighter pixels correspond to larger errors. MSPN updates depth maps and masks gradually to generate the final refined depth map.
  • Figure 2: An overview of the proposed SDR framework.
  • Figure 3: Illustration of the pixel-to-window attention process and generation process of ${\mathbf{R}}^n$. ${\mathbf{M}}^{n+1}$ is generated in the same manner.
  • Figure 4: SDR results on NYUv2. For each depth map, the corresponding error map is provided below, in which brighter pixels represent larger errors.
  • Figure 5: Comparison of the SDR performances on NYUv2.
  • ...and 6 more figures