Table of Contents
Fetching ...

Open-World Panoptic Segmentation

Matteo Sodano, Federico Magistri, Jens Behley, Cyrill Stachniss

TL;DR

This work tackles open-world panoptic segmentation by introducing Con2MAV, a lightweight encoder with three decoders (semantic, contrastive, and instance) that jointly learns per-class descriptors, anomaly separation, and class-agnostic instance clustering to discover new categories and instances at test time. It extends prior ContMAV with a novel pre-logit descriptor strategy and vector-field losses, and it introduces PANIC, a challenging autonomous-driving dataset with 800 images and over 50 unknown classes to evaluate anomaly, open-world semantic, open-set and open-world panoptic tasks. Across SegmentMeIfYouCan, BDDAnomaly, COCO, SUIM, and PANIC, Con2MAV achieves state-of-the-art performance on open-world tasks while maintaining competitive closed-world accuracy, and the PANIC benchmark reveals substantial room for progress in open-world segmentation. The work provides a unified nomenclature, comprehensive benchmarks, and a pathway for future research toward robust, safe perception in unconstrained environments.

Abstract

Perception is a key building block of autonomously acting vision systems such as autonomous vehicles. It is crucial that these systems are able to understand their surroundings in order to operate safely and robustly. Additionally, autonomous systems deployed in unconstrained real-world scenarios must be able of dealing with novel situations and object that have never been seen before. In this article, we tackle the problem of open-world panoptic segmentation, i.e., the task of discovering new semantic categories and new object instances at test time, while enforcing consistency among the categories that we incrementally discover. We propose Con2MAV, an approach for open-world panoptic segmentation that extends our previous work, ContMAV, which was developed for open-world semantic segmentation. Through extensive experiments across multiple datasets, we show that our model achieves state-of-the-art results on open-world segmentation tasks, while still performing competitively on the known categories. We will open-source our implementation upon acceptance. Additionally, we propose PANIC (Panoptic ANomalies In Context), a benchmark for evaluating open-world panoptic segmentation in autonomous driving scenarios. This dataset, recorded with a multi-modal sensor suite mounted on a car, provides high-quality, pixel-wise annotations of anomalous objects at both semantic and instance level. Our dataset contains 800 images, with more than 50 unknown classes, i.e., classes that do not appear in the training set, and 4000 object instances, making it an extremely challenging dataset for open-world segmentation tasks in the autonomous driving scenario. We provide competitions for multiple open-world tasks on a hidden test set. Our dataset and competitions are available at https://www.ipb.uni-bonn.de/data/panic.

Open-World Panoptic Segmentation

TL;DR

This work tackles open-world panoptic segmentation by introducing Con2MAV, a lightweight encoder with three decoders (semantic, contrastive, and instance) that jointly learns per-class descriptors, anomaly separation, and class-agnostic instance clustering to discover new categories and instances at test time. It extends prior ContMAV with a novel pre-logit descriptor strategy and vector-field losses, and it introduces PANIC, a challenging autonomous-driving dataset with 800 images and over 50 unknown classes to evaluate anomaly, open-world semantic, open-set and open-world panoptic tasks. Across SegmentMeIfYouCan, BDDAnomaly, COCO, SUIM, and PANIC, Con2MAV achieves state-of-the-art performance on open-world tasks while maintaining competitive closed-world accuracy, and the PANIC benchmark reveals substantial room for progress in open-world segmentation. The work provides a unified nomenclature, comprehensive benchmarks, and a pathway for future research toward robust, safe perception in unconstrained environments.

Abstract

Perception is a key building block of autonomously acting vision systems such as autonomous vehicles. It is crucial that these systems are able to understand their surroundings in order to operate safely and robustly. Additionally, autonomous systems deployed in unconstrained real-world scenarios must be able of dealing with novel situations and object that have never been seen before. In this article, we tackle the problem of open-world panoptic segmentation, i.e., the task of discovering new semantic categories and new object instances at test time, while enforcing consistency among the categories that we incrementally discover. We propose Con2MAV, an approach for open-world panoptic segmentation that extends our previous work, ContMAV, which was developed for open-world semantic segmentation. Through extensive experiments across multiple datasets, we show that our model achieves state-of-the-art results on open-world segmentation tasks, while still performing competitively on the known categories. We will open-source our implementation upon acceptance. Additionally, we propose PANIC (Panoptic ANomalies In Context), a benchmark for evaluating open-world panoptic segmentation in autonomous driving scenarios. This dataset, recorded with a multi-modal sensor suite mounted on a car, provides high-quality, pixel-wise annotations of anomalous objects at both semantic and instance level. Our dataset contains 800 images, with more than 50 unknown classes, i.e., classes that do not appear in the training set, and 4000 object instances, making it an extremely challenging dataset for open-world segmentation tasks in the autonomous driving scenario. We provide competitions for multiple open-world tasks on a hidden test set. Our dataset and competitions are available at https://www.ipb.uni-bonn.de/data/panic.

Paper Structure

This paper contains 28 sections, 21 equations, 13 figures, 19 tables.

Figures (13)

  • Figure 1: Our proposed approach, Con2MAV, is able to tackle multiple open-world tasks and segment unknown objects and categories in multiple datasets spanning multiple domains. In the figure, we show predictions on SegmentMeIfYouCan chan2021neurips for anomaly segmentation, SUIM islam2020iros for open-world semantic segmentation, COCO lin2014eccv for open-set panoptic segmentation, and PANIC (ours) for open-world panoptic segmentation (we show only the instance mask for clarity). We show further qualitative examples in Sec. \ref{['sec:experiments']} and in the supplementary material.
  • Figure 2: Our dataset, PANIC, provides pixel-wise annotations of unknown semantic categories and object instances of RGB images. The images have been recorded with a sensor suite vizzo2023itsc mounted on a vehicle driving in Bonn, Germany. The dataset consists of images collected at different times of day over the span of more than a year, and more than 50 different unknown classes appear that are not present in Cityscapes.
  • Figure 3: A schematic breakdown of the task discussed in this paper.
  • Figure 4: Our network processes an RGB image via an encoder and three decoders and yields the final open-world panoptic segmentation result.
  • Figure 5: Qualitative results of our approach, Con2MAV, on open-world semantic segmentation on SUIM (top row) and PANIC (bottom row). The prediction mask is overlayed to the input RGB for clarity. In the prediction, different colors correspond to different predicted classes. We compare our approach, Con2MAV (right), with our old method, ContMAV (center).
  • ...and 8 more figures