Renovating Names in Open-Vocabulary Segmentation Benchmarks
Haiwen Huang, Songyou Peng, Dan Zhang, Andreas Geiger
TL;DR
RENOVATE tackles the naming misalignment in open-vocabulary segmentation by automatically generating context-rich candidate names and learning a per-segment renaming model that aligns visual segments with refined text labels. The approach uses context-noun augmentation and GPT-4-based candidate generation, a CLIP-enabled transformer decoder with feedforward attention biased by ground-truth masks, and negative sampling to produce high-quality, segment-level names. Empirical results show RENOVATE improves open-vocabulary generalization (up to ~4 PQ and ~5 mIoU gains) and data efficiency, while enabling fine-grained evaluation via semantic-name similarity metrics that reveal benign misclassifications and model biases. The work demonstrates practical benefits for relabeling datasets like COCO, ADE20K, and Cityscapes and provides resources for improved benchmarking and dataset curation in vision-language segmentation.
Abstract
Names are essential to both human cognition and vision-language models. Open-vocabulary models utilize class names as text prompts to generalize to categories unseen during training. However, the precision of these names is often overlooked in existing datasets. In this paper, we address this underexplored problem by presenting a framework for "renovating" names in open-vocabulary segmentation benchmarks (RENOVATE). Our framework features a renaming model that enhances the quality of names for each visual segment. Through experiments, we demonstrate that our renovated names help train stronger open-vocabulary models with up to 15% relative improvement and significantly enhance training efficiency with improved data quality. We also show that our renovated names improve evaluation by better measuring misclassification and enabling fine-grained model analysis. We will provide our code and relabelings for several popular segmentation datasets (MS COCO, ADE20K, Cityscapes) to the research community.
