HDCNet: A Hybrid Depth Completion Network for Grasping Transparent and Reflective Objects
Guanghu Xie, Mingxu Li, Songwei Wu, Yang Liu, Zongwu Xie, Baoshi Cao, Hong Liu
TL;DR
HDCNet addresses the critical problem of depth perception for transparent and reflective objects by introducing a hybrid depth completion network that fuses RGB-D and depth modalities through a dual-branch Transformer-CNN encoder, a shallow multimodal fusion module, and a bottleneck Transformer-Mamba fusion block. The approach achieves state-of-the-art depth completion on public benchmarks and demonstrates practical gains in robotic grasping tasks, with improvements in grasp success rates for challenging materials. Key contributions include the hierarchical multimodal fusion strategy and the demonstration that combining Transformer, CNN, and Mamba architectures yields robust, globally informed depth estimates. The method's effectiveness across real and synthetic datasets, plus real-world grasping validation, highlights the potential of hybrid fusion architectures for robust perception in complex optical environments.
Abstract
Depth perception of transparent and reflective objects has long been a critical challenge in robotic manipulation.Conventional depth sensors often fail to provide reliable measurements on such surfaces, limiting the performance of robots in perception and grasping tasks. To address this issue, we propose a novel depth completion network,HDCNet,which integrates the complementary strengths of Transformer,CNN and Mamba architectures.Specifically,the encoder is designed as a dual-branch Transformer-CNN framework to extract modality-specific features. At the shallow layers of the encoder, we introduce a lightweight multimodal fusion module to effectively integrate low-level features. At the network bottleneck,a Transformer-Mamba hybrid fusion module is developed to achieve deep integration of high-level semantic and global contextual information, significantly enhancing depth completion accuracy and robustness. Extensive evaluations on multiple public datasets demonstrate that HDCNet achieves state-of-the-art(SOTA) performance in depth completion tasks.Furthermore,robotic grasping experiments show that HDCNet substantially improves grasp success rates for transparent and reflective objects,achieving up to a 60% increase.
