Table of Contents
Fetching ...

Renovating Names in Open-Vocabulary Segmentation Benchmarks

Haiwen Huang, Songyou Peng, Dan Zhang, Andreas Geiger

TL;DR

RENOVATE tackles the naming misalignment in open-vocabulary segmentation by automatically generating context-rich candidate names and learning a per-segment renaming model that aligns visual segments with refined text labels. The approach uses context-noun augmentation and GPT-4-based candidate generation, a CLIP-enabled transformer decoder with feedforward attention biased by ground-truth masks, and negative sampling to produce high-quality, segment-level names. Empirical results show RENOVATE improves open-vocabulary generalization (up to ~4 PQ and ~5 mIoU gains) and data efficiency, while enabling fine-grained evaluation via semantic-name similarity metrics that reveal benign misclassifications and model biases. The work demonstrates practical benefits for relabeling datasets like COCO, ADE20K, and Cityscapes and provides resources for improved benchmarking and dataset curation in vision-language segmentation.

Abstract

Names are essential to both human cognition and vision-language models. Open-vocabulary models utilize class names as text prompts to generalize to categories unseen during training. However, the precision of these names is often overlooked in existing datasets. In this paper, we address this underexplored problem by presenting a framework for "renovating" names in open-vocabulary segmentation benchmarks (RENOVATE). Our framework features a renaming model that enhances the quality of names for each visual segment. Through experiments, we demonstrate that our renovated names help train stronger open-vocabulary models with up to 15% relative improvement and significantly enhance training efficiency with improved data quality. We also show that our renovated names improve evaluation by better measuring misclassification and enabling fine-grained model analysis. We will provide our code and relabelings for several popular segmentation datasets (MS COCO, ADE20K, Cityscapes) to the research community.

Renovating Names in Open-Vocabulary Segmentation Benchmarks

TL;DR

RENOVATE tackles the naming misalignment in open-vocabulary segmentation by automatically generating context-rich candidate names and learning a per-segment renaming model that aligns visual segments with refined text labels. The approach uses context-noun augmentation and GPT-4-based candidate generation, a CLIP-enabled transformer decoder with feedforward attention biased by ground-truth masks, and negative sampling to produce high-quality, segment-level names. Empirical results show RENOVATE improves open-vocabulary generalization (up to ~4 PQ and ~5 mIoU gains) and data efficiency, while enabling fine-grained evaluation via semantic-name similarity metrics that reveal benign misclassifications and model biases. The work demonstrates practical benefits for relabeling datasets like COCO, ADE20K, and Cityscapes and provides resources for improved benchmarking and dataset curation in vision-language segmentation.

Abstract

Names are essential to both human cognition and vision-language models. Open-vocabulary models utilize class names as text prompts to generalize to categories unseen during training. However, the precision of these names is often overlooked in existing datasets. In this paper, we address this underexplored problem by presenting a framework for "renovating" names in open-vocabulary segmentation benchmarks (RENOVATE). Our framework features a renaming model that enhances the quality of names for each visual segment. Through experiments, we demonstrate that our renovated names help train stronger open-vocabulary models with up to 15% relative improvement and significantly enhance training efficiency with improved data quality. We also show that our renovated names improve evaluation by better measuring misclassification and enabling fine-grained model analysis. We will provide our code and relabelings for several popular segmentation datasets (MS COCO, ADE20K, Cityscapes) to the research community.
Paper Structure (29 sections, 3 equations, 19 figures, 7 tables)

This paper contains 29 sections, 3 equations, 19 figures, 7 tables.

Figures (19)

  • Figure 1: Problems of names in current segmentation benchmarks. We demonstrate examples from well-known datasets: MS COCO Lin2014ECCV, ADE20K Zhou2017CVPRb, and Cityscapes cordts2016cityscapes. Our renovated names are visually more aligned and help models to generalize better.
  • Figure 2: Overview of candidate name generation and renaming model training. We generate candidate names based on the context names and train the renaming model to match them with the segments. For illustration clarity, we show only one segment. In practice, multiple segments are jointly trained, pairing with the text queries.
  • Figure 3: Obtaining renovated names. In (a) we illustrate how we use the renaming model to obtain a renovated name for each segment. In (b) we demonstrate that the renaming results are helpful to dataset analysis with examples from "person" class.
  • Figure 4: Examples of renovated names on segments from the validation sets of ADE20K and Cityscapes. For each segment, we show the original name below the image and the renovated name in the text box. See more visual results in the supplements.
  • Figure 5: MS COCO $\rightarrow$ ADE20K.
  • ...and 14 more figures