Table of Contents
Fetching ...

Occlusion-Ordered Semantic Instance Segmentation

Soroosh Baselizadeh, Cheuk-To Yu, Olga Veksler, Yuri Boykov

TL;DR

This work addresses 3D reasoning from a single image by introducing Occlusion-Ordered Semantic Instance Segmentation (OOSIS), a joint task that couples relative depth ordering from occlusions with semantic instance segmentation. It uses a two-stage approach: a CNN predicts semantic labels, oriented occlusion boundaries, and boundary orientations, while a CRF-based labeling simultaneously infers instance masks and their occlusion order, enabling a coherent 3D-like reasoning from 2D input. The authors introduce a novel oriented occlusion boundary model, a jump move based CRF optimization with a submodular upper bound, and a new OOSIS evaluation metric called the OAIR curve that jointly measures mask accuracy and occlusion order correctness. Empirical results on KINS and COCOA show state-of-the-art performance on the joint task and demonstrate advantages over baselines that address only either instance segmentation or occlusion ordering, highlighting the practical potential for improved 3D understanding in single-image scenes.

Abstract

Standard semantic instance segmentation provides useful, but inherently 2D information from a single image. To enable 3D analysis, one usually integrates absolute monocular depth estimation with instance segmentation. However, monocular depth is a difficult task. Instead, we leverage a simpler single-image task, occlusion-based relative depth ordering, providing coarser but useful 3D information. We show that relative depth ordering works more reliably from occlusions than from absolute depth. We propose to solve the joint task of relative depth ordering and segmentation of instances based on occlusions. We call this task Occlusion-Ordered Semantic Instance Segmentation (OOSIS). We develop an approach to OOSIS that extracts instances and their occlusion order simultaneously from oriented occlusion boundaries and semantic segmentation. Unlike popular detect-and-segment framework for instance segmentation, combining occlusion ordering with instance segmentation allows a simple and clean formulation of OOSIS as a labeling problem. As a part of our solution for OOSIS, we develop a novel oriented occlusion boundaries approach that significantly outperforms prior work. We also develop a new joint OOSIS metric based both on instance mask accuracy and correctness of their occlusion order. We achieve better performance than strong baselines on KINS and COCOA datasets.

Occlusion-Ordered Semantic Instance Segmentation

TL;DR

This work addresses 3D reasoning from a single image by introducing Occlusion-Ordered Semantic Instance Segmentation (OOSIS), a joint task that couples relative depth ordering from occlusions with semantic instance segmentation. It uses a two-stage approach: a CNN predicts semantic labels, oriented occlusion boundaries, and boundary orientations, while a CRF-based labeling simultaneously infers instance masks and their occlusion order, enabling a coherent 3D-like reasoning from 2D input. The authors introduce a novel oriented occlusion boundary model, a jump move based CRF optimization with a submodular upper bound, and a new OOSIS evaluation metric called the OAIR curve that jointly measures mask accuracy and occlusion order correctness. Empirical results on KINS and COCOA show state-of-the-art performance on the joint task and demonstrate advantages over baselines that address only either instance segmentation or occlusion ordering, highlighting the practical potential for improved 3D understanding in single-image scenes.

Abstract

Standard semantic instance segmentation provides useful, but inherently 2D information from a single image. To enable 3D analysis, one usually integrates absolute monocular depth estimation with instance segmentation. However, monocular depth is a difficult task. Instead, we leverage a simpler single-image task, occlusion-based relative depth ordering, providing coarser but useful 3D information. We show that relative depth ordering works more reliably from occlusions than from absolute depth. We propose to solve the joint task of relative depth ordering and segmentation of instances based on occlusions. We call this task Occlusion-Ordered Semantic Instance Segmentation (OOSIS). We develop an approach to OOSIS that extracts instances and their occlusion order simultaneously from oriented occlusion boundaries and semantic segmentation. Unlike popular detect-and-segment framework for instance segmentation, combining occlusion ordering with instance segmentation allows a simple and clean formulation of OOSIS as a labeling problem. As a part of our solution for OOSIS, we develop a novel oriented occlusion boundaries approach that significantly outperforms prior work. We also develop a new joint OOSIS metric based both on instance mask accuracy and correctness of their occlusion order. We achieve better performance than strong baselines on KINS and COCOA datasets.

Paper Structure

This paper contains 15 sections, 15 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: Illustrates occlusion-ordered semantic instance segmentation (OOSIS). Given an image, we output: (1) instances, (2) their occlusion order, visualized here as a relative depth map, i.e. an instance has a larger intensity than any neighboring instances it occludes.
  • Figure 2: Overview of our approach.
  • Figure 3: Relationship between ${\mathbb B}_p$ and ${\mathbb O}_p$.
  • Figure 4: Illustration of our semantic segmentation and oriented occlusion boundaries. Images are: input, ground truth and our semantic segmentation, ground truth and our oriented occlusion boundaries after non-maximum suppression. The boundary color scheme: left-cyan, top-yellow, right-magenta, bottom-black. Best viewed zoomed in.
  • Figure 5: Jump move optimization. Warmer colors correspond to larger labels. Energies are listed.
  • ...and 7 more figures