Table of Contents
Fetching ...

Shape Bias and Robustness Evaluation via Cue Decomposition for Image Classification and Segmentation

Edgar Heinert, Thomas Gottwald, Annika Mütze, Matthias Rottmann

TL;DR

The paper tackles the problem of quantifying shape versus texture biases in DNNs for image classification and semantic segmentation. It proposes an AI-free cue-decomposition pipeline that creates separate shape (via edge-enhancing diffusion) and texture (via Voronoi shuffling) versions of images, and defines two metrics, $S_{\mathrm{cd}}$ (shape bias) and $R_{\mathrm{cd}}$ (robustness), to evaluate models. Across 60+ pre-trained networks on ImageNet and semantic segmentation datasets Cityscapes and ADE20k, the approach yields strong alignment with traditional cue-conflict shape-bias measurements while delivering superior robustness prediction, and offers the first broad analysis of segmentation biases. The work provides practical, extensible tools and a public leaderboard, revealing architecture-dependent differences (e.g., ViTs and VLMs showing more shape bias) and significant implications for model robustness in real-world vision tasks.

Abstract

Previous works studied how deep neural networks (DNNs) perceive image content in terms of their biases towards different image cues, such as texture and shape. Previous methods to measure shape and texture biases are typically style-transfer-based and limited to DNNs for image classification. In this work, we provide a new evaluation procedure consisting of 1) a cue-decomposition method that comprises two AI-free data pre-processing methods extracting shape and texture cues, respectively, and 2) a novel cue-decomposition shape bias evaluation metric that leverages the cue-decomposition data. For application purposes we introduce a corresponding cue-decomposition robustness metric that allows for the estimation of the robustness of a DNN w.r.t. image corruptions. In our numerical experiments, our findings for biases in image classification DNNs align with those of previous evaluation metrics. However, our cue-decomposition robustness metric shows superior results in terms of estimating the robustness of DNNs. Furthermore, our results for DNNs on the semantic segmentation datasets Cityscapes and ADE20k for the first time shed light into the biases of semantic segmentation DNNs.

Shape Bias and Robustness Evaluation via Cue Decomposition for Image Classification and Segmentation

TL;DR

The paper tackles the problem of quantifying shape versus texture biases in DNNs for image classification and semantic segmentation. It proposes an AI-free cue-decomposition pipeline that creates separate shape (via edge-enhancing diffusion) and texture (via Voronoi shuffling) versions of images, and defines two metrics, (shape bias) and (robustness), to evaluate models. Across 60+ pre-trained networks on ImageNet and semantic segmentation datasets Cityscapes and ADE20k, the approach yields strong alignment with traditional cue-conflict shape-bias measurements while delivering superior robustness prediction, and offers the first broad analysis of segmentation biases. The work provides practical, extensible tools and a public leaderboard, revealing architecture-dependent differences (e.g., ViTs and VLMs showing more shape bias) and significant implications for model robustness in real-world vision tasks.

Abstract

Previous works studied how deep neural networks (DNNs) perceive image content in terms of their biases towards different image cues, such as texture and shape. Previous methods to measure shape and texture biases are typically style-transfer-based and limited to DNNs for image classification. In this work, we provide a new evaluation procedure consisting of 1) a cue-decomposition method that comprises two AI-free data pre-processing methods extracting shape and texture cues, respectively, and 2) a novel cue-decomposition shape bias evaluation metric that leverages the cue-decomposition data. For application purposes we introduce a corresponding cue-decomposition robustness metric that allows for the estimation of the robustness of a DNN w.r.t. image corruptions. In our numerical experiments, our findings for biases in image classification DNNs align with those of previous evaluation metrics. However, our cue-decomposition robustness metric shows superior results in terms of estimating the robustness of DNNs. Furthermore, our results for DNNs on the semantic segmentation datasets Cityscapes and ADE20k for the first time shed light into the biases of semantic segmentation DNNs.

Paper Structure

This paper contains 27 sections, 6 equations, 6 figures, 14 tables.

Figures (6)

  • Figure 1: Decomposition of an ImageNet image (left) into its shape cue via Edge Enhancing Diffusion (center left) and texture cue via Voronoi patch shuffling (center right) with the corresponding semantic segmentation mask for the Voronoi-shuffled image (right).
  • Figure 2: Voronoi shuffling method: The image is decomposed into $N$ Voronoi cells. Each Voronoi cell is filled with random crops of the original image by randomly shifting the cell on the image.
  • Figure 3: Simple corruptions of an ImageNet image, from left to right: original; contrast level $0.2$; high-pass difference between original and Gaussian blurring with standard deviation 1.5; low-pass Gaussian blurring with standard deviation $8$; Gaussian noise uniformly sampled from $[0,0.6]$; 90 degree phase noise.
  • Figure 4: Studied AI-free texture cue extraction candidates of an ImageNet image, from left to right: original; patch shuffling; diamond shuffling; Voronoi shuffling; difference of original and EED; the former patch shuffled.
  • Figure 5: Examples for studied cue-conflict candidates for ImageNet, from left to right: Style transfer of a cat with an elephant texture; blend of an EED airplane image with a Voronoi-shuffled image of a bear; sum of an EED cat image with the difference of an original image of an elephant and its EED version; sum of an EED image of a car ($\gamma_S = 1$) with the patch-shuffled difference of an original dog image and its EED version with $\gamma_T = 2$.
  • ...and 1 more figures