PanSR: An Object-Centric Mask Transformer for Panoptic Segmentation
Lojze Žust, Matej Kristan
TL;DR
PanSR addresses core weaknesses of mask-transformer panoptic segmentation by introducing an object-centric pipeline. It combines an Object-Centric Proposal (OCP) module for robust thing proposals, proposal-aware matching to prevent FP drift and FN suppression, and object-centric mask prediction constrained by bounding boxes to reduce instance merging. Training includes mask-conditioned queries to simulate proposal noise, enhancing robustness in varied scenes. Empirically, PanSR achieves a +3.4 PQ improvement on LaRS and competitive performance on Cityscapes, highlighting improved small-object detection, crowded-scene handling, and generalization across domains.
Abstract
Panoptic segmentation is a fundamental task in computer vision and a crucial component for perception in autonomous vehicles. Recent mask-transformer-based methods achieve impressive performance on standard benchmarks but face significant challenges with small objects, crowded scenes and scenes exhibiting a wide range of object scales. We identify several fundamental shortcomings of the current approaches: (i) the query proposal generation process is biased towards larger objects, resulting in missed smaller objects, (ii) initially well-localized queries may drift to other objects, resulting in missed detections, (iii) spatially well-separated instances may be merged into a single mask causing inconsistent and false scene interpretations. To address these issues, we rethink the individual components of the network and its supervision, and propose a novel method for panoptic segmentation PanSR. PanSR effectively mitigates instance merging, enhances small-object detection and increases performance in crowded scenes, delivering a notable +3.4 PQ improvement over state-of-the-art on the challenging LaRS benchmark, while reaching state-of-the-art performance on Cityscapes. The code and models will be publicly available at https://github.com/lojzezust/PanSR.
