Table of Contents
Fetching ...

Transparent Object Depth Completion

Yifan Zhou, Wanli Peng, Zhongyu Yang, He Liu, Yi Sun

TL;DR

This work tackles the challenging problem of depth perception for transparent objects by proposing an end-to-end network that jointly leverages single-view RGB-D depth completion and multi-view depth estimation. A depth-injection module feeds single-view depth information into the multi-view cost volume to correct global depth bias, while a confidence-based refinement fuses $D_{multi}$ and $D_{single}$ into the final depth $\hat{D}$. The method demonstrates state-of-the-art accuracy on ClearPose and TransCG, including scenarios with heavy occlusion and translucent coverings, and analyzes the individual contributions of depth injection and depth refinement through ablation studies. The approach holds practical significance for robust grasping and manipulation of transparent objects, albeit with higher GPU memory requirements compared to some single-view methods.

Abstract

The perception of transparent objects for grasp and manipulation remains a major challenge, because existing robotic grasp methods which heavily rely on depth maps are not suitable for transparent objects due to their unique visual properties. These properties lead to gaps and inaccuracies in the depth maps of the transparent objects captured by depth sensors. To address this issue, we propose an end-to-end network for transparent object depth completion that combines the strengths of single-view RGB-D based depth completion and multi-view depth estimation. Moreover, we introduce a depth refinement module based on confidence estimation to fuse predicted depth maps from single-view and multi-view modules, which further refines the restored depth map. The extensive experiments on the ClearPose and TransCG datasets demonstrate that our method achieves superior accuracy and robustness in complex scenarios with significant occlusion compared to the state-of-the-art methods.

Transparent Object Depth Completion

TL;DR

This work tackles the challenging problem of depth perception for transparent objects by proposing an end-to-end network that jointly leverages single-view RGB-D depth completion and multi-view depth estimation. A depth-injection module feeds single-view depth information into the multi-view cost volume to correct global depth bias, while a confidence-based refinement fuses and into the final depth . The method demonstrates state-of-the-art accuracy on ClearPose and TransCG, including scenarios with heavy occlusion and translucent coverings, and analyzes the individual contributions of depth injection and depth refinement through ablation studies. The approach holds practical significance for robust grasping and manipulation of transparent objects, albeit with higher GPU memory requirements compared to some single-view methods.

Abstract

The perception of transparent objects for grasp and manipulation remains a major challenge, because existing robotic grasp methods which heavily rely on depth maps are not suitable for transparent objects due to their unique visual properties. These properties lead to gaps and inaccuracies in the depth maps of the transparent objects captured by depth sensors. To address this issue, we propose an end-to-end network for transparent object depth completion that combines the strengths of single-view RGB-D based depth completion and multi-view depth estimation. Moreover, we introduce a depth refinement module based on confidence estimation to fuse predicted depth maps from single-view and multi-view modules, which further refines the restored depth map. The extensive experiments on the ClearPose and TransCG datasets demonstrate that our method achieves superior accuracy and robustness in complex scenarios with significant occlusion compared to the state-of-the-art methods.
Paper Structure (15 sections, 3 equations, 5 figures, 3 tables)

This paper contains 15 sections, 3 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Overview of our proposed method. We predict the depth maps separately from a single RGB-D image and multi-view RGB images. Then, the single-view probability volume is injected into the multi-view depth estimation. Finally, we predict the confidence of these depth maps and refine the restored depth.
  • Figure 2: Visualization of single-view and multi-view depth. The second row shows an enlarged view of the depth map. In the depth maps, the lighter gray value indicates a greater depth value.
  • Figure 3: Qualitative comparison to the state-of-the-art methods.
  • Figure 4: Qualitative results of depth injection.
  • Figure 5: Qualitative results of the depth refinement. In the confidence maps, the brighter gray value indicates a higher confidence.