Table of Contents
Fetching ...

PanSR: An Object-Centric Mask Transformer for Panoptic Segmentation

Lojze Žust, Matej Kristan

TL;DR

PanSR addresses core weaknesses of mask-transformer panoptic segmentation by introducing an object-centric pipeline. It combines an Object-Centric Proposal (OCP) module for robust thing proposals, proposal-aware matching to prevent FP drift and FN suppression, and object-centric mask prediction constrained by bounding boxes to reduce instance merging. Training includes mask-conditioned queries to simulate proposal noise, enhancing robustness in varied scenes. Empirically, PanSR achieves a +3.4 PQ improvement on LaRS and competitive performance on Cityscapes, highlighting improved small-object detection, crowded-scene handling, and generalization across domains.

Abstract

Panoptic segmentation is a fundamental task in computer vision and a crucial component for perception in autonomous vehicles. Recent mask-transformer-based methods achieve impressive performance on standard benchmarks but face significant challenges with small objects, crowded scenes and scenes exhibiting a wide range of object scales. We identify several fundamental shortcomings of the current approaches: (i) the query proposal generation process is biased towards larger objects, resulting in missed smaller objects, (ii) initially well-localized queries may drift to other objects, resulting in missed detections, (iii) spatially well-separated instances may be merged into a single mask causing inconsistent and false scene interpretations. To address these issues, we rethink the individual components of the network and its supervision, and propose a novel method for panoptic segmentation PanSR. PanSR effectively mitigates instance merging, enhances small-object detection and increases performance in crowded scenes, delivering a notable +3.4 PQ improvement over state-of-the-art on the challenging LaRS benchmark, while reaching state-of-the-art performance on Cityscapes. The code and models will be publicly available at https://github.com/lojzezust/PanSR.

PanSR: An Object-Centric Mask Transformer for Panoptic Segmentation

TL;DR

PanSR addresses core weaknesses of mask-transformer panoptic segmentation by introducing an object-centric pipeline. It combines an Object-Centric Proposal (OCP) module for robust thing proposals, proposal-aware matching to prevent FP drift and FN suppression, and object-centric mask prediction constrained by bounding boxes to reduce instance merging. Training includes mask-conditioned queries to simulate proposal noise, enhancing robustness in varied scenes. Empirically, PanSR achieves a +3.4 PQ improvement on LaRS and competitive performance on Cityscapes, highlighting improved small-object detection, crowded-scene handling, and generalization across domains.

Abstract

Panoptic segmentation is a fundamental task in computer vision and a crucial component for perception in autonomous vehicles. Recent mask-transformer-based methods achieve impressive performance on standard benchmarks but face significant challenges with small objects, crowded scenes and scenes exhibiting a wide range of object scales. We identify several fundamental shortcomings of the current approaches: (i) the query proposal generation process is biased towards larger objects, resulting in missed smaller objects, (ii) initially well-localized queries may drift to other objects, resulting in missed detections, (iii) spatially well-separated instances may be merged into a single mask causing inconsistent and false scene interpretations. To address these issues, we rethink the individual components of the network and its supervision, and propose a novel method for panoptic segmentation PanSR. PanSR effectively mitigates instance merging, enhances small-object detection and increases performance in crowded scenes, delivering a notable +3.4 PQ improvement over state-of-the-art on the challenging LaRS benchmark, while reaching state-of-the-art performance on Cityscapes. The code and models will be publicly available at https://github.com/lojzezust/PanSR.

Paper Structure

This paper contains 22 sections, 10 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Recent transformer-based methods for panoptic segmentation rely on a simple query proposal approach (top left) and struggle with instance separation (bottom left). PanSR presents object-centric query proposals (top right) and reworks the mask decoding process for thing classes, leading to significant improvements in instance segmentation (bottom right).
  • Figure 2: Failure cases of mask transformers: well-initialized queries drift away from original objects during decoder iterations (top), and well-separated objects become merged by predicted segmentation masks (bottom). Colors indicate instance labels.
  • Figure 3: Architecture of PanSR. The backbone features are processed by a transformer encoder into a feature pyramid. Object-Centric Proposal extractor (OCP) is used to obtain thing queries. A transformer decoder refines the instance queries. The predicted masks of thing classes are limited by their predicted bounding boxes. Mask-conditioned queries (MC) ensure robustness to noise in proposal extraction.
  • Figure 4: Architecture of the OCP module head.
  • Figure 5: The proposal-aware matching scheme (left) alleviates the problem of false-negative and false-positive matches. Mask-conditioned queries (right) simulate random variation in content and positional queries of proposals during training.
  • ...and 6 more figures