Table of Contents
Fetching ...

TDCNet: Transparent Objects Depth Completion with CNN-Transformer Dual-Branch Parallel Network

Xianghui Fan, Chao Ye, Anping Deng, Xiaotian Wu, Mengyang Pan, Hang Yang

TL;DR

This paper tackles the challenging problem of depth completion for transparent objects, where traditional sensors struggle due to refraction and low texture. It introduces TDCNet, a CNN–Transformer parallel dual-branch encoder–decoder that separately processes the original depth map and RGB-D features, then fuses them via a Multiscale Feature Fusion Module to recover complete depth maps. A novel adaptive loss strategy modulates the influence of the smoothing term during training, improving convergence and reducing gradient conflicts. Experiments on TransCG, ClearGrasp, and Omniverse show state-of-the-art performance and strong cross-dataset generalization, with the approach preserving depth edges while filling missing regions, benefiting downstream robotic manipulation tasks.

Abstract

The sensing and manipulation of transparent objects present a critical challenge in industrial and laboratory robotics. Conventional sensors face challenges in obtaining the full depth of transparent objects due to the refraction and reflection of light on their surfaces and their lack of visible texture. Previous research has attempted to obtain complete depth maps of transparent objects from RGB and damaged depth maps (collected by depth sensor) using deep learning models. However, existing methods fail to fully utilize the original depth map, resulting in limited accuracy for deep completion. To solve this problem, we propose TDCNet, a novel dual-branch CNN-Transformer parallel network for transparent object depth completion. The proposed framework consists of two different branches: one extracts features from partial depth maps, while the other processes RGB-D images. Experimental results demonstrate that our model achieves state-of-the-art performance across multiple public datasets. Our code and the pre-trained model are publicly available at https://github.com/XianghuiFan/TDCNet.

TDCNet: Transparent Objects Depth Completion with CNN-Transformer Dual-Branch Parallel Network

TL;DR

This paper tackles the challenging problem of depth completion for transparent objects, where traditional sensors struggle due to refraction and low texture. It introduces TDCNet, a CNN–Transformer parallel dual-branch encoder–decoder that separately processes the original depth map and RGB-D features, then fuses them via a Multiscale Feature Fusion Module to recover complete depth maps. A novel adaptive loss strategy modulates the influence of the smoothing term during training, improving convergence and reducing gradient conflicts. Experiments on TransCG, ClearGrasp, and Omniverse show state-of-the-art performance and strong cross-dataset generalization, with the approach preserving depth edges while filling missing regions, benefiting downstream robotic manipulation tasks.

Abstract

The sensing and manipulation of transparent objects present a critical challenge in industrial and laboratory robotics. Conventional sensors face challenges in obtaining the full depth of transparent objects due to the refraction and reflection of light on their surfaces and their lack of visible texture. Previous research has attempted to obtain complete depth maps of transparent objects from RGB and damaged depth maps (collected by depth sensor) using deep learning models. However, existing methods fail to fully utilize the original depth map, resulting in limited accuracy for deep completion. To solve this problem, we propose TDCNet, a novel dual-branch CNN-Transformer parallel network for transparent object depth completion. The proposed framework consists of two different branches: one extracts features from partial depth maps, while the other processes RGB-D images. Experimental results demonstrate that our model achieves state-of-the-art performance across multiple public datasets. Our code and the pre-trained model are publicly available at https://github.com/XianghuiFan/TDCNet.

Paper Structure

This paper contains 15 sections, 7 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: When encountering transparent objects, the depth map directly obtained from an RBG-D sensor is often incomplete, which poses challenges for robotic operations. The depth completion network can address this issue by reconstructing the incomplete depth map, enabling downstream applications such as robotic grasping or other manipulation tasks.
  • Figure 2: Comparison of the popular architectures for transparent object depth completion. (a) Previous common single-branch structure where depth maps are usually added in the middle layer. (b) Previous dual-branch structure with a fusion branch, where the fusion branch is used to fuse features from the original depth maps. (c) Our parallel dual-branch structure, where two branches with different backbones extract the features from the original depth maps and the RBG-D image, respectively, and then fuse them.
  • Figure 3: The architecture of TDCNet. Our network consists of two parts: an encoder and a decoder. The encoder consists of two parallel branches and a fusion structure. The two branches use CNN-based and Transformer-based backbones to extract the original depth map and RBG-D (4 channels) features, respectively, and our fusion structure based on the MFFM collects and fuses the features from the two branches from multiple scales. The decoder comprises full convolution modules and upsampling modules, which ultimately process the encoder's features to produce the final depth map.
  • Figure 4: The structure of SA (spatial attention) and CA (channel attention), where SA and CA compute weight matrices instead of feature maps. The blue arrow on the right represents Relu mapping.
  • Figure 5: The visualization result on TransCG dataset. Each pixel of the error map is calculated by the following relative error: $|d-d^* |/d^*$.
  • ...and 1 more figures