Table of Contents
Fetching ...

Open-World Semantic Segmentation Including Class Similarity

Matteo Sodano, Federico Magistri, Lucas Nunes, Jens Behley, Cyrill Stachniss

TL;DR

This work tackles open-world semantic segmentation by jointly handling anomaly segmentation and novel class discovery within a single, lightweight encoder–decoder network. It introduces a dual-decoder architecture: a semantic decoder that learns per-class feature descriptors to push known-class features toward fixed prototypes, and a contrastive decoder that, via the objectosphere and contrastive losses, separates unknown regions while enabling pixel-level anomaly scoring. Unknown pixels are clustered into new classes using a post-processing phase that stores activation vectors and updates class prototypes, while a Gaussian similarity model provides a measure of how unknowns relate to known categories. Extensive experiments on SegmentMeIfYouCan and BDDAnomaly show state-of-the-art anomaly segmentation performance, strong open-world segmentation capability, and credible class-similarity predictions, with ablations confirming the value of the feature-space losses and Gaussian post-processing. Overall, the approach advances practical open-world scene understanding with robust detection of novel objects and meaningful similarity signals for downstream planning and mapping.

Abstract

Interpreting camera data is key for autonomously acting systems, such as autonomous vehicles. Vision systems that operate in real-world environments must be able to understand their surroundings and need the ability to deal with novel situations. This paper tackles open-world semantic segmentation, i.e., the variant of interpreting image data in which objects occur that have not been seen during training. We propose a novel approach that performs accurate closed-world semantic segmentation and, at the same time, can identify new categories without requiring any additional training data. Our approach additionally provides a similarity measure for every newly discovered class in an image to a known category, which can be useful information in downstream tasks such as planning or mapping. Through extensive experiments, we show that our model achieves state-of-the-art results on classes known from training data as well as for anomaly segmentation and can distinguish between different unknown classes.

Open-World Semantic Segmentation Including Class Similarity

TL;DR

This work tackles open-world semantic segmentation by jointly handling anomaly segmentation and novel class discovery within a single, lightweight encoder–decoder network. It introduces a dual-decoder architecture: a semantic decoder that learns per-class feature descriptors to push known-class features toward fixed prototypes, and a contrastive decoder that, via the objectosphere and contrastive losses, separates unknown regions while enabling pixel-level anomaly scoring. Unknown pixels are clustered into new classes using a post-processing phase that stores activation vectors and updates class prototypes, while a Gaussian similarity model provides a measure of how unknowns relate to known categories. Extensive experiments on SegmentMeIfYouCan and BDDAnomaly show state-of-the-art anomaly segmentation performance, strong open-world segmentation capability, and credible class-similarity predictions, with ablations confirming the value of the feature-space losses and Gaussian post-processing. Overall, the approach advances practical open-world scene understanding with robust detection of novel objects and meaningful similarity signals for downstream planning and mapping.

Abstract

Interpreting camera data is key for autonomously acting systems, such as autonomous vehicles. Vision systems that operate in real-world environments must be able to understand their surroundings and need the ability to deal with novel situations. This paper tackles open-world semantic segmentation, i.e., the variant of interpreting image data in which objects occur that have not been seen during training. We propose a novel approach that performs accurate closed-world semantic segmentation and, at the same time, can identify new categories without requiring any additional training data. Our approach additionally provides a similarity measure for every newly discovered class in an image to a known category, which can be useful information in downstream tasks such as planning or mapping. Through extensive experiments, we show that our model achieves state-of-the-art results on classes known from training data as well as for anomaly segmentation and can distinguish between different unknown classes.
Paper Structure (23 sections, 15 equations, 8 figures, 12 tables)

This paper contains 23 sections, 15 equations, 8 figures, 12 tables.

Figures (8)

  • Figure 1: Given an image containing a previously-unseen object (top), closed-world methods for semantic segmentation classify the pixels belonging to that object as one of the known classes (center, red circle). Our goal is to segment the unknown object and identify it as a semantic class different to the previously-known ones (bottom, green circle).
  • Figure 2: Given an RGB image as input, our network processes it by means of an encoder and two decoders. The semantic decoder produces a closed-world semantic segmentation and a Gaussian model for each known category. The class Gaussian models are built from a learned class descriptor (mean) and the variance of all predictions from it. A 3D example is shown in the image. The contrastive decoder provides an anomaly segmentation output. A post-processing phase finally achieves open-world semantic segmentation.
  • Figure 3: 2D visualization of the expected output of the contrastive decoder. The behavior of the objectosphere loss is shown in A, where all points coming from known classes (black) lie around the red (outer) circle of radius $\xi$, see Eq. (\ref{['eq:loss_objectosphere']}), and the points from unknown classes lie around the origin. The contrastive loss is shown in B, where features lie on the unit circle. Together, they lead to a behavior similar to the one depicted in C.
  • Figure 4: Results from the validation set of SegmentMeIfYouCan. We show the input RGB overlayed with the ground truth unknown mask (a), the prediction of our closed-world model (b), and the prediction of our approach for open-world segmentation (c). In the open-world prediction, the unknown class is shown in red.
  • Figure 5: Anomaly segmentation results from the validation set of SegmentMeIfYouCan. We show the input RGB overlayed with the ground truth unknown mask (a), the prediction of our closed-world model (b), and the prediction of our approach for open-world segmentation (c). In the open-world prediction, the unknown class is shown in red. Notice how the two models, that are both trained on CityScapes, perform similarly on known classes, demonstrating that our approach does not degrade closed-world performance.
  • ...and 3 more figures