Table of Contents
Fetching ...

Self-Supervised Learning for Transparent Object Depth Completion Using Depth from Non-Transparent Objects

Xianghui Fan, Zhaoyu Chen, Mengyang Pan, Anping Deng, Hang Yang

TL;DR

This work tackles the difficulty of obtaining depth for transparent objects by proposing a self-supervised depth completion framework that simulates depth deficits in non-transparent regions using segmentation-guided masking and the original depth as supervision. The approach uses a masking strategy to create realistic training pairs, trains a two-branch network, and then fine-tunes with full RGB-D data, achieving performance close to supervised methods. Experiments on the TransCG dataset show that the self-supervised method provides substantial benefits, particularly when labeled data are scarce, and ablations validate the design choices. The work offers a practical path to reducing labeling costs in transparent-object depth completion and demonstrates the value of targeted self-supervision for this domain.

Abstract

The perception of transparent objects is one of the well-known challenges in computer vision. Conventional depth sensors have difficulty in sensing the depth of transparent objects due to refraction and reflection of light. Previous research has typically train a neural network to complete the depth acquired by the sensor, and this method can quickly and accurately acquire accurate depth maps of transparent objects. However, previous training relies on a large amount of annotation data for supervision, and the labeling of depth maps is costly. To tackle this challenge, we propose a new self-supervised method for training depth completion networks. Our method simulates the depth deficits of transparent objects within non-transparent regions and utilizes the original depth map as ground truth for supervision. Experiments demonstrate that our method achieves performance comparable to supervised approach, and pre-training with our method can improve the model performance when the training samples are small.

Self-Supervised Learning for Transparent Object Depth Completion Using Depth from Non-Transparent Objects

TL;DR

This work tackles the difficulty of obtaining depth for transparent objects by proposing a self-supervised depth completion framework that simulates depth deficits in non-transparent regions using segmentation-guided masking and the original depth as supervision. The approach uses a masking strategy to create realistic training pairs, trains a two-branch network, and then fine-tunes with full RGB-D data, achieving performance close to supervised methods. Experiments on the TransCG dataset show that the self-supervised method provides substantial benefits, particularly when labeled data are scarce, and ablations validate the design choices. The work offers a practical path to reducing labeling costs in transparent-object depth completion and demonstrates the value of targeted self-supervision for this domain.

Abstract

The perception of transparent objects is one of the well-known challenges in computer vision. Conventional depth sensors have difficulty in sensing the depth of transparent objects due to refraction and reflection of light. Previous research has typically train a neural network to complete the depth acquired by the sensor, and this method can quickly and accurately acquire accurate depth maps of transparent objects. However, previous training relies on a large amount of annotation data for supervision, and the labeling of depth maps is costly. To tackle this challenge, we propose a new self-supervised method for training depth completion networks. Our method simulates the depth deficits of transparent objects within non-transparent regions and utilizes the original depth map as ground truth for supervision. Experiments demonstrate that our method achieves performance comparable to supervised approach, and pre-training with our method can improve the model performance when the training samples are small.

Paper Structure

This paper contains 8 sections, 9 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: When the RGB-D sensor captures the depth of a transparent object, a localized depth deficit occurs. We simulate this effect in the non-transparent object region through artificial masking.
  • Figure 2: Pipelines of supervised learning and our self-supervised learning. We use masked input data from the supervised process to perform self-supervised learning, without relying on the full transparent object depth map at any point.
  • Figure 3: Our masking strategy
  • Figure 4: Qualitative results of our full self-supervised learning method compared to other methods on the TransCG dataset. Each pixel of the error map is calculated by the following relative error: $|d-d^* |/d^*$.The closer the pixel color is to the background, the smaller the relative error, whereas the closer it is to red, the larger the relative error.
  • Figure 5: Qualitative comparison of fine-tuning results using our self-supervised learning approach versus pre-training with other methods on the TransCG dataset. Each pixel of the error map is calculated by the following relative error: $|d-d^* |/d^*$.The closer the pixel color is to the background, the smaller the relative error, whereas the closer it is to red, the larger the relative error.