Table of Contents
Fetching ...

Instance-aware Semantic Segmentation via Multi-task Network Cascades

Jifeng Dai, Kaiming He, Jian Sun

TL;DR

The paper tackles instance-aware semantic segmentation by decomposing the task into three interconnected sub-tasks and solving them with a cascaded, multi-task CNN that shares features across stages.A differentiable RoI warping layer enables true end-to-end training of the cascade, allowing gradients to flow through predicted box coordinates and masks.Empirical results demonstrate state-of-the-art performance on PASCAL VOC and strong COCO segmentation results, with substantial speed advantages (around 360 ms per image on VGG-16) due to shared computations and avoidance of external mask proposals.The approach is extensible to deeper cascades (e.g., 5-stage) and deeper backbones (e.g., ResNet-101), illustrating the method’s scalability and practical impact for real-time, high-quality instance segmentation.

Abstract

Semantic segmentation research has recently witnessed rapid progress, but many leading methods are unable to identify object instances. In this paper, we present Multi-task Network Cascades for instance-aware semantic segmentation. Our model consists of three networks, respectively differentiating instances, estimating masks, and categorizing objects. These networks form a cascaded structure, and are designed to share their convolutional features. We develop an algorithm for the nontrivial end-to-end training of this causal, cascaded structure. Our solution is a clean, single-step training framework and can be generalized to cascades that have more stages. We demonstrate state-of-the-art instance-aware semantic segmentation accuracy on PASCAL VOC. Meanwhile, our method takes only 360ms testing an image using VGG-16, which is two orders of magnitude faster than previous systems for this challenging problem. As a by product, our method also achieves compelling object detection results which surpass the competitive Fast/Faster R-CNN systems. The method described in this paper is the foundation of our submissions to the MS COCO 2015 segmentation competition, where we won the 1st place.

Instance-aware Semantic Segmentation via Multi-task Network Cascades

TL;DR

The paper tackles instance-aware semantic segmentation by decomposing the task into three interconnected sub-tasks and solving them with a cascaded, multi-task CNN that shares features across stages.A differentiable RoI warping layer enables true end-to-end training of the cascade, allowing gradients to flow through predicted box coordinates and masks.Empirical results demonstrate state-of-the-art performance on PASCAL VOC and strong COCO segmentation results, with substantial speed advantages (around 360 ms per image on VGG-16) due to shared computations and avoidance of external mask proposals.The approach is extensible to deeper cascades (e.g., 5-stage) and deeper backbones (e.g., ResNet-101), illustrating the method’s scalability and practical impact for real-time, high-quality instance segmentation.

Abstract

Semantic segmentation research has recently witnessed rapid progress, but many leading methods are unable to identify object instances. In this paper, we present Multi-task Network Cascades for instance-aware semantic segmentation. Our model consists of three networks, respectively differentiating instances, estimating masks, and categorizing objects. These networks form a cascaded structure, and are designed to share their convolutional features. We develop an algorithm for the nontrivial end-to-end training of this causal, cascaded structure. Our solution is a clean, single-step training framework and can be generalized to cascades that have more stages. We demonstrate state-of-the-art instance-aware semantic segmentation accuracy on PASCAL VOC. Meanwhile, our method takes only 360ms testing an image using VGG-16, which is two orders of magnitude faster than previous systems for this challenging problem. As a by product, our method also achieves compelling object detection results which surpass the competitive Fast/Faster R-CNN systems. The method described in this paper is the foundation of our submissions to the MS COCO 2015 segmentation competition, where we won the 1st place.

Paper Structure

This paper contains 13 sections, 9 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Illustrations of common multi-task learning (left) and our multi-task cascade (right).
  • Figure 2: Multi-task Network Cascades for instance-aware semantic segmentation. At the top right corner is a simplified illustration.
  • Figure 3: A 5-stage cascade. On stage 3, bounding boxes updated by the box regression layer are used as the input to stage 4.
  • Figure 4: Our instance-aware semantic segmentation results on the PASCAL VOC 2012 validation set. One color denotes one instance.
  • Figure 5: Our instance-aware semantic segmentation results on the MS COCO test-dev set using ResNet-101 He2015a.