Table of Contents
Fetching ...

RigNet++: Semantic Assisted Repetitive Image Guided Network for Depth Completion

Zhiqiang Yan, Xiang Li, Le Hui, Zhenyu Zhang, Jun Li, Jian Yang

TL;DR

RigNet++ advances depth completion by introducing a semantic assisted repetitive image guided framework. A Dense Repetitive Hourglass Network (DRHN) enhances image guidance, while a Repetitive Guidance (RG) module progressively refines depth, aided by SAM-derived semantic priors and a Region-Aware Spatial Propagation Network (RASPN) for edge-preserving refinement. The system also contributes a smartphone-based TOFDC dataset to reflect real-world mobile scenarios. Across KITTI, NYUv2, Matterport3D, 3D60, VKITTI, and TOFDC, RigNet++ achieves state-of-the-art performance and demonstrates strong generalization under varying sparsity and conditions. These results underscore the practical potential for high-fidelity depth completion in autonomous driving, AR/VR, and mobile sensing contexts.

Abstract

Depth completion aims to recover dense depth maps from sparse ones, where color images are often used to facilitate this task. Recent depth methods primarily focus on image guided learning frameworks. However, blurry guidance in the image and unclear structure in the depth still impede their performance. To tackle these challenges, we explore a repetitive design in our image guided network to gradually and sufficiently recover depth values. Specifically, the repetition is embodied in both the image guidance branch and depth generation branch. In the former branch, we design a dense repetitive hourglass network (DRHN) to extract discriminative image features of complex environments, which can provide powerful contextual instruction for depth prediction. In the latter branch, we present a repetitive guidance (RG) module based on dynamic convolution, in which an efficient convolution factorization is proposed to reduce the complexity while modeling high-frequency structures progressively. Furthermore, in the semantic guidance branch, we utilize the well-known large vision model, i.e., segment anything (SAM), to supply RG with semantic prior. In addition, we propose a region-aware spatial propagation network (RASPN) for further depth refinement based on the semantic prior constraint. Finally, we collect a new dataset termed TOFDC for the depth completion task, which is acquired by the time-of-flight (TOF) sensor and the color camera on smartphones. Extensive experiments demonstrate that our method achieves state-of-the-art performance on KITTI, NYUv2, Matterport3D, 3D60, VKITTI, and our TOFDC.

RigNet++: Semantic Assisted Repetitive Image Guided Network for Depth Completion

TL;DR

RigNet++ advances depth completion by introducing a semantic assisted repetitive image guided framework. A Dense Repetitive Hourglass Network (DRHN) enhances image guidance, while a Repetitive Guidance (RG) module progressively refines depth, aided by SAM-derived semantic priors and a Region-Aware Spatial Propagation Network (RASPN) for edge-preserving refinement. The system also contributes a smartphone-based TOFDC dataset to reflect real-world mobile scenarios. Across KITTI, NYUv2, Matterport3D, 3D60, VKITTI, and TOFDC, RigNet++ achieves state-of-the-art performance and demonstrates strong generalization under varying sparsity and conditions. These results underscore the practical potential for high-fidelity depth completion in autonomous driving, AR/VR, and mobile sensing contexts.

Abstract

Depth completion aims to recover dense depth maps from sparse ones, where color images are often used to facilitate this task. Recent depth methods primarily focus on image guided learning frameworks. However, blurry guidance in the image and unclear structure in the depth still impede their performance. To tackle these challenges, we explore a repetitive design in our image guided network to gradually and sufficiently recover depth values. Specifically, the repetition is embodied in both the image guidance branch and depth generation branch. In the former branch, we design a dense repetitive hourglass network (DRHN) to extract discriminative image features of complex environments, which can provide powerful contextual instruction for depth prediction. In the latter branch, we present a repetitive guidance (RG) module based on dynamic convolution, in which an efficient convolution factorization is proposed to reduce the complexity while modeling high-frequency structures progressively. Furthermore, in the semantic guidance branch, we utilize the well-known large vision model, i.e., segment anything (SAM), to supply RG with semantic prior. In addition, we propose a region-aware spatial propagation network (RASPN) for further depth refinement based on the semantic prior constraint. Finally, we collect a new dataset termed TOFDC for the depth completion task, which is acquired by the time-of-flight (TOF) sensor and the color camera on smartphones. Extensive experiments demonstrate that our method achieves state-of-the-art performance on KITTI, NYUv2, Matterport3D, 3D60, VKITTI, and our TOFDC.
Paper Structure (27 sections, 15 equations, 18 figures, 10 tables)

This paper contains 27 sections, 15 equations, 18 figures, 10 tables.

Figures (18)

  • Figure 1: To obtain dense depth Prediction, most existing image guided methods employ (a) tandem models ma2018selfCheng2020CSPNpark2020nonlocallin2022dynamiczhang2023cf or (b) parallel models zhao2021adaptivetang2020learningliu2021fcfrhu2020PENetliu2023mff with various inputs, e.g., Boundary, Normal, Semantic, and RGB-D. By contrast, we propose (c) dense repetitive mechanism, which is assisted by the semantic prior of large vision model, i.e., segment anything (SAM) kirillov2023segment, to gradually produce refined image and depth Guidance with rich semantic information.
  • Figure 2: Overview of our semantic assisted repetitive image guided network. It mainly consists of image guidance branch, semantic guidance branch, and depth generation branch. Our RG (Fig. \ref{['Fig_RG_EG']}) produces dense depth by fusing the features of the three branches, while the post-processing RASPN (Fig. \ref{['Fig_RASPN']}) further refines the coarse depth using the semantic constraint.
  • Figure 3: An example of our dense repetitive hourglass network (DRHN). B.R.C refers to BN, ReLU, and convolution.
  • Figure 4: Our repetitive guidance (RG) module that consists of an efficient guidance algorithm (EG) and an adaptive fusion mechanism (AF), where $k$ refers to the repetitive number.
  • Figure 5: Comparison of SPNs, including SPN liu2017learning, CSPN Cheng2020CSPN, and our RASPN. The colored regions in (c) refer to the object classes provided by semantic masks.
  • ...and 13 more figures