Table of Contents
Fetching ...

Mask-adaptive Gated Convolution and Bi-directional Progressive Fusion Network for Depth Completion

Tingxuan Huang, Jiacheng Miao, Shizhuo Deng, Tong, Dongyue Chen

TL;DR

Depth completion for indoor scenes with missing depth pixels is challenging due to error propagation in standard convolutions. The authors introduce MagaConv, a mask-aware gated convolution that modulates kernel usage via iteratively updated masks, and Bid-AP, a global bi-directional projection for RGB-D fusion within an encoder–decoder framework. A Structure-Consistency loss complements MSE to preserve depth edges, and ablations confirm the complementary benefits of MagaConv, Bid-AP, and the loss terms. On NYU-Depth V2, DIML, and SUN RGB-D, the approach achieves state-of-the-art or competitive accuracy with favorable compute efficiency, indicating strong potential for real-time indoor depth completion. Overall, the work advances robust depth reconstruction under missing data through mask-guided feature extraction and globally aligned RGB-D fusion.

Abstract

Depth completion is a critical task for handling depth images with missing pixels, which can negatively impact further applications. Recent approaches have utilized Convolutional Neural Networks (CNNs) to reconstruct depth images with the assistance of color images. However, vanilla convolution has non-negligible drawbacks in handling missing pixels. To solve this problem, we propose a new model for depth completion based on an encoder-decoder structure. Our model introduces two key components: the Mask-adaptive Gated Convolution (MagaConv) architecture and the Bi-directional Progressive Fusion (BP-Fusion) module. The MagaConv architecture is designed to acquire precise depth features by modulating convolution operations with iteratively updated masks, while the BP-Fusion module progressively integrates depth and color features, utilizing consecutive bi-directional fusion structures in a global perspective. Extensive experiments on popular benchmarks, including NYU-Depth V2, DIML, and SUN RGB-D, demonstrate the superiority of our model over state-of-the-art methods. We achieved remarkable performance in completing depth maps and outperformed existing approaches in terms of accuracy and reliability.

Mask-adaptive Gated Convolution and Bi-directional Progressive Fusion Network for Depth Completion

TL;DR

Depth completion for indoor scenes with missing depth pixels is challenging due to error propagation in standard convolutions. The authors introduce MagaConv, a mask-aware gated convolution that modulates kernel usage via iteratively updated masks, and Bid-AP, a global bi-directional projection for RGB-D fusion within an encoder–decoder framework. A Structure-Consistency loss complements MSE to preserve depth edges, and ablations confirm the complementary benefits of MagaConv, Bid-AP, and the loss terms. On NYU-Depth V2, DIML, and SUN RGB-D, the approach achieves state-of-the-art or competitive accuracy with favorable compute efficiency, indicating strong potential for real-time indoor depth completion. Overall, the work advances robust depth reconstruction under missing data through mask-guided feature extraction and globally aligned RGB-D fusion.

Abstract

Depth completion is a critical task for handling depth images with missing pixels, which can negatively impact further applications. Recent approaches have utilized Convolutional Neural Networks (CNNs) to reconstruct depth images with the assistance of color images. However, vanilla convolution has non-negligible drawbacks in handling missing pixels. To solve this problem, we propose a new model for depth completion based on an encoder-decoder structure. Our model introduces two key components: the Mask-adaptive Gated Convolution (MagaConv) architecture and the Bi-directional Progressive Fusion (BP-Fusion) module. The MagaConv architecture is designed to acquire precise depth features by modulating convolution operations with iteratively updated masks, while the BP-Fusion module progressively integrates depth and color features, utilizing consecutive bi-directional fusion structures in a global perspective. Extensive experiments on popular benchmarks, including NYU-Depth V2, DIML, and SUN RGB-D, demonstrate the superiority of our model over state-of-the-art methods. We achieved remarkable performance in completing depth maps and outperformed existing approaches in terms of accuracy and reliability.
Paper Structure (16 sections, 9 equations, 7 figures, 3 tables)

This paper contains 16 sections, 9 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: The comparison between Partial Convolution and Maga Convolution, designed to encode incomplete depth images using associated masks. Here, $X^t$ is the input/output feature at encoding step $t$. $W_i$ is the specific convolution kernels applied at position $i$. While PConv ensures output from valid pixels, it overlooks the challenge of using the same kernels for various invalidity levels, as it may mask out crucial parameters in $W$. MagaConv addresses this by selecting kernels tailored to specific invalid patterns.
  • Figure 2: Pipeline of our depth completion model, including the MagaConv architecture, the Bid-AP module we proposed. $M_{(b, l)}$ represents adaptive masks, where $b$ and $l$ represent the block and layer, respectively.
  • Figure 3: Details of the MagaConv and M-Layer. Each M-Layer consists of multiple MagaConv heads to facilitate feature extraction using diverse kernel sizes. Specifically, we implement three parallel heads with kernel sizes of 3, 5, and 7 for practical application.
  • Figure 4: Details of the Bid-AP and CMAP.
  • Figure 5: Visualizations of typical features to demonstrate effectiveness on DIML dataset. The missing area in (b) shrinks to (c) after a MagaConv. The green "coarsely complete depth" with unclear boundaries in (d) and the red"depth-irrelevant" features in (e) disappeared after being combined by a Bid-AP (f).
  • ...and 2 more figures