Table of Contents
Fetching ...

Synthetic Instance Segmentation from Semantic Image Segmentation Masks

Yuchen Shen, Dong Zhang, Zhao Zhang, Liyong Fu, Qiaolin Ye

TL;DR

Synthetic Instance Segmentation (SISeg) achieves instance segmentation results by leveraging image masks generated by existing semantic segmentation models, and it is highly efficient as it does not require additional training for semantic segmentation or the use of instance-level image annotations.

Abstract

In recent years, instance segmentation has garnered significant attention across various applications. However, training a fully-supervised instance segmentation model requires costly both instance-level and pixel-level annotations. In contrast, weakly-supervised instance segmentation methods, such as those using image-level class labels or point labels, often struggle to satisfy the accuracy and recall requirements of practical scenarios. In this paper, we propose a novel paradigm called Synthetic Instance Segmentation (SISeg). SISeg achieves instance segmentation results by leveraging image masks generated by existing semantic segmentation models, and it is highly efficient as we do not require additional training for semantic segmentation or the use of instance-level image annotations. In other words, the proposed model does not need extra manpower or higher computational expenses. Specifically, we first obtain a semantic segmentation mask of the input image via an existent semantic segmentation model. Then, we calculate a displacement field vector for each pixel based on the segmentation mask, which can indicate representations belonging to the same class but different instances, i.e., obtaining the instance-level object information. Finally, the instance segmentation results are refined by a learnable category-agnostic object boundary branch. Extensive experimental results on two challenging datasets highlight the effectiveness of SISeg in achieving competitive results when compared to state-of-the-art methods, especially fully-supervised methods. The code will be released at: SISeg

Synthetic Instance Segmentation from Semantic Image Segmentation Masks

TL;DR

Synthetic Instance Segmentation (SISeg) achieves instance segmentation results by leveraging image masks generated by existing semantic segmentation models, and it is highly efficient as it does not require additional training for semantic segmentation or the use of instance-level image annotations.

Abstract

In recent years, instance segmentation has garnered significant attention across various applications. However, training a fully-supervised instance segmentation model requires costly both instance-level and pixel-level annotations. In contrast, weakly-supervised instance segmentation methods, such as those using image-level class labels or point labels, often struggle to satisfy the accuracy and recall requirements of practical scenarios. In this paper, we propose a novel paradigm called Synthetic Instance Segmentation (SISeg). SISeg achieves instance segmentation results by leveraging image masks generated by existing semantic segmentation models, and it is highly efficient as we do not require additional training for semantic segmentation or the use of instance-level image annotations. In other words, the proposed model does not need extra manpower or higher computational expenses. Specifically, we first obtain a semantic segmentation mask of the input image via an existent semantic segmentation model. Then, we calculate a displacement field vector for each pixel based on the segmentation mask, which can indicate representations belonging to the same class but different instances, i.e., obtaining the instance-level object information. Finally, the instance segmentation results are refined by a learnable category-agnostic object boundary branch. Extensive experimental results on two challenging datasets highlight the effectiveness of SISeg in achieving competitive results when compared to state-of-the-art methods, especially fully-supervised methods. The code will be released at: SISeg
Paper Structure (23 sections, 19 equations, 5 figures, 4 tables)

This paper contains 23 sections, 19 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: The diagrams illustrate the existing instance segmentation in (a) and our proposed approach SISeg in (b) of generating instance segmentation results from semantic image segmentation masks. Compared to the existing methods, ours does not require instance-level annotations and can be derived from existing semantic segmentation masks, resulting in higher efficiency. Samples are from the PASCAL VOC 2012 dataset everingham2010pascal.
  • Figure 2: The architecture of our proposed SISeg (a) mainly consists of two parallel branches sharing a backbone: the DFM branch (b) for separating the instances via predicting the displacement field and the CBR branch (c) for improving segmentation results along the boundary via calculating the semantic similarity of pixel pairs, which collaboratively achieve corresponding instance segmentation from the existing semantic masks without instance-level annotations. The whole network is optimized by minimizing both losses $L_D$ and $L_B$ on pixel-level annotated data. The $T$ function is used to transform the ground truth label map into semantic similarity labels for calculating the $L_B$. The samples are selected from the val set of PASCAL VOC 2012 everingham2010pascal.
  • Figure 3: Visualization comparisons on PASCAL VOC 2012 everingham2010pascal using OCRNet yuan2020object as Baseline. "Baseline + DFM" and "Baseline + DFM + CBR" mean applying DFM, the combination of DFM and CBR on the Baseline, respectively. Compared with the baseline model, our proposed DFM predicts the instance segmentation mask from semantic segmentation and its object class boundaries are further refined by our CBR. The white dashed boxes emphasize the better areas that are gradually revised by our proposed modules. Notably, the instance-level ground-truth is only used in the model evaluation process, here for comparison to highlight the effectiveness of our approach.
  • Figure 4: Visualization results for instance segmentation on PASCAL VOC 2012 everingham2010pascal. PSPNet zhao2017pyramid is applied for semantic segmentation. We visualize the 2D displacement field by encoding offset vectors in color. The last two rows display two failure cases. It is worth noting that the instance-level ground truth is not available during model training.
  • Figure 5: More visual examples of our instance segmentation model on ADE20K zhou2017scene.