Completion as Enhancement: A Degradation-Aware Selective Image Guided Network for Depth Completion
Zhiqiang Yan, Zhengxue Wang, Kun Wang, Jun Li, Jian Yang
TL;DR
This work reframes depth completion as depth enhancement by first densifying sparse depth with non-CNN methods to obtain a coarse depth map and then learning an implicit degradation that links this coarse depth to the target dense depth. A Degradation-Aware Decomposition and Fusion (DADF) module decomposes the degradation in the frequency domain to selectively incorporate high-frequency RGB information, while a Conditional Mamba enables global RGB-D interaction aligned with degradation cues. The model is trained with a reconstruction loss plus a self-supervised degradation loss, encouraging accurate depth recovery and meaningful degradation representations. Across NYUv2, DIML, SUN RGB-D, and TOFDC, SigNet achieves state-of-the-art results with strong generalization and significantly reduced model size and inference time, indicating practical impact for robust RGB-D sensing and scene understanding. Limitations include reduced performance on extremely sparse outdoor KITTI data, suggesting potential gains from auxiliary dense-depth supervision or edge-aware priors.
Abstract
In this paper, we introduce the Selective Image Guided Network (SigNet), a novel degradation-aware framework that transforms depth completion into depth enhancement for the first time. Moving beyond direct completion using convolutional neural networks (CNNs), SigNet initially densifies sparse depth data through non-CNN densification tools to obtain coarse yet dense depth. This approach eliminates the mismatch and ambiguity caused by direct convolution over irregularly sampled sparse data. Subsequently, SigNet redefines completion as enhancement, establishing a self-supervised degradation bridge between the coarse depth and the targeted dense depth for effective RGB-D fusion. To achieve this, SigNet leverages the implicit degradation to adaptively select high-frequency components (e.g., edges) of RGB data to compensate for the coarse depth. This degradation is further integrated into a multi-modal conditional Mamba, dynamically generating the state parameters to enable efficient global high-frequency information interaction. We conduct extensive experiments on the NYUv2, DIML, SUN RGBD, and TOFDC datasets, demonstrating the state-of-the-art (SOTA) performance of SigNet.
