Table of Contents
Fetching ...

SOGNet: Scene Overlap Graph Network for Panoptic Segmentation

Yibo Yang, Hongyang Li, Xia Li, Qijie Zhao, Jianlong Wu, Zhouchen Lin

TL;DR

This work tackles panoptic segmentation by explicitly modeling and resolving overlaps between instance masks. It introduces SOGNet, which builds a scene overlap graph using category, geometry, and appearance features to produce a relation matrix that encodes overlaps, and includes a differentiable overlap-resolving module to remove conflicts before final panoptic prediction. Lacking direct supervision for overlaps, the model leverages panoptic supervision and a weakly supervised overlap loss to guide relation learning, achieving state-of-the-art results on COCO and Cityscapes. The approach provides interpretable overlap relations and competitive performance, marking a significant step toward unified, overlap-aware panoptic segmentation.

Abstract

The panoptic segmentation task requires a unified result from semantic and instance segmentation outputs that may contain overlaps. However, current studies widely ignore modeling overlaps. In this study, we aim to model overlap relations among instances and resolve them for panoptic segmentation. Inspired by scene graph representation, we formulate the overlapping problem as a simplified case, named scene overlap graph. We leverage each object's category, geometry and appearance features to perform relational embedding, and output a relation matrix that encodes overlap relations. In order to overcome the lack of supervision, we introduce a differentiable module to resolve the overlap between any pair of instances. The mask logits after removing overlaps are fed into per-pixel instance \verb|id| classification, which leverages the panoptic supervision to assist in the modeling of overlap relations. Besides, we generate an approximate ground truth of overlap relations as the weak supervision, to quantify the accuracy of overlap relations predicted by our method. Experiments on COCO and Cityscapes demonstrate that our method is able to accurately predict overlap relations, and outperform the state-of-the-art performance for panoptic segmentation. Our method also won the Innovation Award in COCO 2019 challenge.

SOGNet: Scene Overlap Graph Network for Panoptic Segmentation

TL;DR

This work tackles panoptic segmentation by explicitly modeling and resolving overlaps between instance masks. It introduces SOGNet, which builds a scene overlap graph using category, geometry, and appearance features to produce a relation matrix that encodes overlaps, and includes a differentiable overlap-resolving module to remove conflicts before final panoptic prediction. Lacking direct supervision for overlaps, the model leverages panoptic supervision and a weakly supervised overlap loss to guide relation learning, achieving state-of-the-art results on COCO and Cityscapes. The approach provides interpretable overlap relations and competitive performance, marking a significant step toward unified, overlap-aware panoptic segmentation.

Abstract

The panoptic segmentation task requires a unified result from semantic and instance segmentation outputs that may contain overlaps. However, current studies widely ignore modeling overlaps. In this study, we aim to model overlap relations among instances and resolve them for panoptic segmentation. Inspired by scene graph representation, we formulate the overlapping problem as a simplified case, named scene overlap graph. We leverage each object's category, geometry and appearance features to perform relational embedding, and output a relation matrix that encodes overlap relations. In order to overcome the lack of supervision, we introduce a differentiable module to resolve the overlap between any pair of instances. The mask logits after removing overlaps are fed into per-pixel instance \verb|id| classification, which leverages the panoptic supervision to assist in the modeling of overlap relations. Besides, we generate an approximate ground truth of overlap relations as the weak supervision, to quantify the accuracy of overlap relations predicted by our method. Experiments on COCO and Cityscapes demonstrate that our method is able to accurately predict overlap relations, and outperform the state-of-the-art performance for panoptic segmentation. Our method also won the Innovation Award in COCO 2019 challenge.

Paper Structure

This paper contains 21 sections, 14 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Instance segmentation has overlapping regions for objects, while panoptic segmentation requires a unified result for each pixel. Our study aims to explicitly predict overlap relations and resolve overlaps for the panoptic output.
  • Figure 2: An illustration of the SOGNet for panoptic segmentation. The instance ground truths are input of our relational embedding module. During inference, they are replaced with the predictions from the instance segmentation head. The architecture is trained in an end-to-end manner. $\sigma$ denotes the ReLU non-linear function.
  • Figure 3: Visualization of the overlap relations encoded by $O$ (down left) and the approximate ground truth, $R^{\star}$ (down right). Note that the activation on location $(i,j)$ represents that the instance $i$ is covered by (lies below) $j$. The indices of instances are marked in the images. Zoom in to have a better view. More visualization results can be found in the supplementary material.
  • Figure 4: The Visualization of panoptic segmentation results of heuristic inference and SOGNet.
  • Figure 5: More visualization examples of the overlap relations predicted from $O$ in SOGNet, and their corresponding approximate ground truths, $R^{\star}$. Note that the activation on location $(i,j)$ represents that instance $i$ is covered by (lies below) instance $j$. The indices of instances are marked in the images. Zoom in to have a better view.