Transparent Object Depth Completion
Yifan Zhou, Wanli Peng, Zhongyu Yang, He Liu, Yi Sun
TL;DR
This work tackles the challenging problem of depth perception for transparent objects by proposing an end-to-end network that jointly leverages single-view RGB-D depth completion and multi-view depth estimation. A depth-injection module feeds single-view depth information into the multi-view cost volume to correct global depth bias, while a confidence-based refinement fuses $D_{multi}$ and $D_{single}$ into the final depth $\hat{D}$. The method demonstrates state-of-the-art accuracy on ClearPose and TransCG, including scenarios with heavy occlusion and translucent coverings, and analyzes the individual contributions of depth injection and depth refinement through ablation studies. The approach holds practical significance for robust grasping and manipulation of transparent objects, albeit with higher GPU memory requirements compared to some single-view methods.
Abstract
The perception of transparent objects for grasp and manipulation remains a major challenge, because existing robotic grasp methods which heavily rely on depth maps are not suitable for transparent objects due to their unique visual properties. These properties lead to gaps and inaccuracies in the depth maps of the transparent objects captured by depth sensors. To address this issue, we propose an end-to-end network for transparent object depth completion that combines the strengths of single-view RGB-D based depth completion and multi-view depth estimation. Moreover, we introduce a depth refinement module based on confidence estimation to fuse predicted depth maps from single-view and multi-view modules, which further refines the restored depth map. The extensive experiments on the ClearPose and TransCG datasets demonstrate that our method achieves superior accuracy and robustness in complex scenarios with significant occlusion compared to the state-of-the-art methods.
