Table of Contents
Fetching ...

Benchmarking Segmentation Models with Mask-Preserved Attribute Editing

Zijin Yin, Kongming Liang, Bing Li, Zhanyu Ma, Jun Guo

TL;DR

This work addresses the need to evaluate segmentation robustness to both local and global attribute variations by introducing a mask-preserved attribute editing pipeline that edits attributes in real images while keeping segmentation masks intact. The authors build Pascal-EA, a robustness benchmark derived from Pascal VOC, encompassing object-level attributes (color, material, pattern) and image-level styles (snow, painting, sketch). A diffusion-based editing framework with mask-guided attention and ControlNet yields edited images that preserve layout, enabling reliable evaluation of diverse segmentation models, including open-vocabulary approaches. Key findings reveal that local attribute changes are as influential as global style shifts, with material alterations posing the strongest impacts; moreover, stronger models do not automatically exhibit greater robustness, underscoring the need for attribute-aware evaluation and robustness improvements.

Abstract

When deploying segmentation models in practice, it is critical to evaluate their behaviors in varied and complex scenes. Different from the previous evaluation paradigms only in consideration of global attribute variations (e.g. adverse weather), we investigate both local and global attribute variations for robustness evaluation. To achieve this, we construct a mask-preserved attribute editing pipeline to edit visual attributes of real images with precise control of structural information. Therefore, the original segmentation labels can be reused for the edited images. Using our pipeline, we construct a benchmark covering both object and image attributes (e.g. color, material, pattern, style). We evaluate a broad variety of semantic segmentation models, spanning from conventional close-set models to recent open-vocabulary large models on their robustness to different types of variations. We find that both local and global attribute variations affect segmentation performances, and the sensitivity of models diverges across different variation types. We argue that local attributes have the same importance as global attributes, and should be considered in the robustness evaluation of segmentation models. Code: https://github.com/PRIS-CV/Pascal-EA.

Benchmarking Segmentation Models with Mask-Preserved Attribute Editing

TL;DR

This work addresses the need to evaluate segmentation robustness to both local and global attribute variations by introducing a mask-preserved attribute editing pipeline that edits attributes in real images while keeping segmentation masks intact. The authors build Pascal-EA, a robustness benchmark derived from Pascal VOC, encompassing object-level attributes (color, material, pattern) and image-level styles (snow, painting, sketch). A diffusion-based editing framework with mask-guided attention and ControlNet yields edited images that preserve layout, enabling reliable evaluation of diverse segmentation models, including open-vocabulary approaches. Key findings reveal that local attribute changes are as influential as global style shifts, with material alterations posing the strongest impacts; moreover, stronger models do not automatically exhibit greater robustness, underscoring the need for attribute-aware evaluation and robustness improvements.

Abstract

When deploying segmentation models in practice, it is critical to evaluate their behaviors in varied and complex scenes. Different from the previous evaluation paradigms only in consideration of global attribute variations (e.g. adverse weather), we investigate both local and global attribute variations for robustness evaluation. To achieve this, we construct a mask-preserved attribute editing pipeline to edit visual attributes of real images with precise control of structural information. Therefore, the original segmentation labels can be reused for the edited images. Using our pipeline, we construct a benchmark covering both object and image attributes (e.g. color, material, pattern, style). We evaluate a broad variety of semantic segmentation models, spanning from conventional close-set models to recent open-vocabulary large models on their robustness to different types of variations. We find that both local and global attribute variations affect segmentation performances, and the sensitivity of models diverges across different variation types. We argue that local attributes have the same importance as global attributes, and should be considered in the robustness evaluation of segmentation models. Code: https://github.com/PRIS-CV/Pascal-EA.
Paper Structure (11 sections, 4 equations, 5 figures, 6 tables)

This paper contains 11 sections, 4 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: The illustration of the motivation of our work. Left: Our mask-preserved attribute editing pipeline generates testing images with various attribute changes for evaluating the robustness of segmentation methods to attribute variations. Right: The average performance drop of segmentation models on our generated data, shows the sensitivity to different types of attribute variations.
  • Figure 2: The illustration of our mask-preserved attribute editing pipeline. (1) We construct the Attribute Set which defines local and global variations. (2) We edit real images with different attribute variations with the collaboration of the large language model and diffusion model. (3) The robustness of segmentation models can be evaluated on the edited images against various types of attribute variations.
  • Figure 3: The illustration of block in our method. Our attention mechanism utilizes an object mask to rectify attention maps in self-attention and cross-attention layers. We adopt ControlNet block controlnet to further restrict the semantic layout.
  • Figure 4: Average mIoU ($\uparrow$) of all segmentation methods under the combination of two different attribute variations.
  • Figure 5: Qualitative comparison of edited images. Left: Manipulating object color in Pascal VOC pascal. Right: Changing to snowy day in Cityscapes cityscapes. Our method achieves the best performance in structure preservation and object consistency.