Table of Contents
Fetching ...

Depth-Guided Semi-Supervised Instance Segmentation

Xin Chen, Jie Hu, Xiawu Zheng, Jianghang Lin, Liujuan Cao, Rongrong Ji

TL;DR

This work proposes Depth Feature Fusion, which integrates features extracted from depth estimation into the SSIS process, and establishes a new benchmark for SSIS, outperforming previous methods.

Abstract

Semi-Supervised Instance Segmentation (SSIS) aims to leverage an amount of unlabeled data during training. Previous frameworks primarily utilized the RGB information of unlabeled images to generate pseudo-labels. However, such a mechanism often introduces unstable noise, as a single instance can display multiple RGB values. To overcome this limitation, we introduce a Depth-Guided (DG) SSIS framework. This framework uses depth maps extracted from input images, which represent individual instances with closely associated distance values, offering precise contours for distinct instances. Unlike RGB data, depth maps provide a unique perspective, making their integration into the SSIS process complex. To this end, we propose Depth Feature Fusion, which integrates features extracted from depth estimation. This integration allows the model to understand depth information better and ensure its effective utilization. Additionally, to manage the variability of depth images during training, we introduce the Depth Controller. This component enables adaptive adjustments of the depth map, enhancing convergence speed and dynamically balancing the loss weights between RGB and depth maps. Extensive experiments conducted on the COCO and Cityscapes datasets validate the efficacy of our proposed method. Our approach establishes a new benchmark for SSIS, outperforming previous methods. Specifically, our DG achieves 22.29%, 31.47%, and 35.14% mAP for 1%, 5%, and 10% labeled data on the COCO dataset, respectively.

Depth-Guided Semi-Supervised Instance Segmentation

TL;DR

This work proposes Depth Feature Fusion, which integrates features extracted from depth estimation into the SSIS process, and establishes a new benchmark for SSIS, outperforming previous methods.

Abstract

Semi-Supervised Instance Segmentation (SSIS) aims to leverage an amount of unlabeled data during training. Previous frameworks primarily utilized the RGB information of unlabeled images to generate pseudo-labels. However, such a mechanism often introduces unstable noise, as a single instance can display multiple RGB values. To overcome this limitation, we introduce a Depth-Guided (DG) SSIS framework. This framework uses depth maps extracted from input images, which represent individual instances with closely associated distance values, offering precise contours for distinct instances. Unlike RGB data, depth maps provide a unique perspective, making their integration into the SSIS process complex. To this end, we propose Depth Feature Fusion, which integrates features extracted from depth estimation. This integration allows the model to understand depth information better and ensure its effective utilization. Additionally, to manage the variability of depth images during training, we introduce the Depth Controller. This component enables adaptive adjustments of the depth map, enhancing convergence speed and dynamically balancing the loss weights between RGB and depth maps. Extensive experiments conducted on the COCO and Cityscapes datasets validate the efficacy of our proposed method. Our approach establishes a new benchmark for SSIS, outperforming previous methods. Specifically, our DG achieves 22.29%, 31.47%, and 35.14% mAP for 1%, 5%, and 10% labeled data on the COCO dataset, respectively.
Paper Structure (18 sections, 8 equations, 8 figures, 5 tables)

This paper contains 18 sections, 8 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: To illustrate the characteristics of depth maps and their complementary role in capturing spatial information from RGB images, we present segmentation results obtained from RGB images and depth maps. (a) and (b) are from COCO lin2014microsoft, (c) is from Cityscapes Cordts2016Cityscapes. (a) The depth map neglects the images reflected in the mirror. (b) The depth map predicts the dark areas of the boat hull and the fence. (c) The depth map is more inclined towards the segmentation of foreground vehicles, and the result treats each car as a whole entity.
  • Figure 2: Framework and training methodology of Depth-Guided (DG). The focus of DG is on the use of depth information. To achieve this, we incorporate a frozen pre-trained depth detector into our framework and propose Depth Fusion (DF) and Depth Controller (DC) to ensure effective utilization of depth information. Specifically, Depth map and RGB as inputs for the teacher model to obtain the pseudo-labels that serve as supervision for the student model. Meanwhile, Features from the depth detector decoder are fused with the backbone of the student model. Last, The Depth Controller module adaptively adjusts the weight of the depth map in the unsupervised loss.
  • Figure 3: Detailed of the Depth Fusion(DF). Unlabeled images($D_u$) input into the student model and the depth estimation model. The features from the student's backbone are fused with the features extracted by the depth estimation model. Finally, these fused features are put into the segmentation head to output the predicted instance masks.
  • Figure 4: Comparison of results on COCO. The evaluation metric is AP50.$^{\dagger}$ denotes data is supervised and taken from Hu2023pseudolabel.$^\ast$ denotes the reproduced results in berrada2023guided
  • Figure 5: Effect of DC on convergence speed.
  • ...and 3 more figures