Panoptic Segmentation
Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, Piotr Dollár
TL;DR
Panoptic segmentation poses a unified vision task that combines semantic and instance segmentation into a single coherent output, evaluated by the novel panoptic quality (PQ) metric. PQ decomposes into segmentation quality (SQ) and recognition quality (RQ) and uses a simple IoU>0.5 matching to merge stuff and things into one framework. The paper provides groundwork with human consistency studies and machine baselines on Cityscapes, ADE20k, and Mapillary Vistas, demonstrating the practicality and challenges of PS and highlighting a notable gap between human and machine recognition, especially for small objects. This work aims to reinvigorate unified scene understanding and catalyze development of end-to-end PS models that jointly reason about all scene elements without overlaps.
Abstract
We propose and study a task we name panoptic segmentation (PS). Panoptic segmentation unifies the typically distinct tasks of semantic segmentation (assign a class label to each pixel) and instance segmentation (detect and segment each object instance). The proposed task requires generating a coherent scene segmentation that is rich and complete, an important step toward real-world vision systems. While early work in computer vision addressed related image/scene parsing tasks, these are not currently popular, possibly due to lack of appropriate metrics or associated recognition challenges. To address this, we propose a novel panoptic quality (PQ) metric that captures performance for all classes (stuff and things) in an interpretable and unified manner. Using the proposed metric, we perform a rigorous study of both human and machine performance for PS on three existing datasets, revealing interesting insights about the task. The aim of our work is to revive the interest of the community in a more unified view of image segmentation.
