Table of Contents
Fetching ...

Reducing Texture Bias of Deep Neural Networks via Edge Enhancing Diffusion

Edgar Heinert, Matthias Rottmann, Kira Maag, Karsten Kahl

TL;DR

This work investigates texture bias in deep networks for semantic segmentation by using Edge Enhancing Diffusion (EED) to create texture-reduced image duplicates of Cityscapes and CARLA. It compares CNNs and vision transformers, showing CNNs are highly texture-biased while transformers are more robust to texture, and demonstrates that training on EED data yields models that largely ignore texture with manageable performance loss. The authors conduct classification and segmentation experiments, ablations on diffusion strength, segment-level analyses, and adversarial robustness tests, illustrating EED's value as both a diagnostic tool and a training-time regularizer. They also release a GPU-accelerated EED implementation to facilitate broader evaluation of texture bias in vision models.

Abstract

Convolutional neural networks (CNNs) for image processing tend to focus on localized texture patterns, commonly referred to as texture bias. While most of the previous works in the literature focus on the task of image classification, we go beyond this and study the texture bias of CNNs in semantic segmentation. In this work, we propose to train CNNs on pre-processed images with less texture to reduce the texture bias. Therein, the challenge is to suppress image texture while preserving shape information. To this end, we utilize edge enhancing diffusion (EED), an anisotropic image diffusion method initially introduced for image compression, to create texture reduced duplicates of existing datasets. Extensive numerical studies are performed with both CNNs and vision transformer models trained on original data and EED-processed data from the Cityscapes dataset and the CARLA driving simulator. We observe strong texture-dependence of CNNs and moderate texture-dependence of transformers. Training CNNs on EED-processed images enables the models to become completely ignorant with respect to texture, demonstrating resilience with respect to texture re-introduction to any degree. Additionally we analyze the performance reduction in depth on a level of connected components in the semantic segmentation and study the influence of EED pre-processing on domain generalization as well as adversarial robustness.

Reducing Texture Bias of Deep Neural Networks via Edge Enhancing Diffusion

TL;DR

This work investigates texture bias in deep networks for semantic segmentation by using Edge Enhancing Diffusion (EED) to create texture-reduced image duplicates of Cityscapes and CARLA. It compares CNNs and vision transformers, showing CNNs are highly texture-biased while transformers are more robust to texture, and demonstrates that training on EED data yields models that largely ignore texture with manageable performance loss. The authors conduct classification and segmentation experiments, ablations on diffusion strength, segment-level analyses, and adversarial robustness tests, illustrating EED's value as both a diagnostic tool and a training-time regularizer. They also release a GPU-accelerated EED implementation to facilitate broader evaluation of texture bias in vision models.

Abstract

Convolutional neural networks (CNNs) for image processing tend to focus on localized texture patterns, commonly referred to as texture bias. While most of the previous works in the literature focus on the task of image classification, we go beyond this and study the texture bias of CNNs in semantic segmentation. In this work, we propose to train CNNs on pre-processed images with less texture to reduce the texture bias. Therein, the challenge is to suppress image texture while preserving shape information. To this end, we utilize edge enhancing diffusion (EED), an anisotropic image diffusion method initially introduced for image compression, to create texture reduced duplicates of existing datasets. Extensive numerical studies are performed with both CNNs and vision transformer models trained on original data and EED-processed data from the Cityscapes dataset and the CARLA driving simulator. We observe strong texture-dependence of CNNs and moderate texture-dependence of transformers. Training CNNs on EED-processed images enables the models to become completely ignorant with respect to texture, demonstrating resilience with respect to texture re-introduction to any degree. Additionally we analyze the performance reduction in depth on a level of connected components in the semantic segmentation and study the influence of EED pre-processing on domain generalization as well as adversarial robustness.
Paper Structure (8 sections, 6 equations, 7 figures, 4 tables)

This paper contains 8 sections, 6 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: A Cityscapes image (top) and its EED-processed counterpart (bottom). Texture is removed to a great extent by EED while shapes and semantic meaning are preserved.
  • Figure 2: Visual Comparison of an original Cityscapes image (left), EED without orientation smoothing (mid) and with orientation smoothing (right). Orientation smoothing preserves shapes while preventing circular singularities.
  • Figure 3: Visual comparison of DeepLabv3+ predictions for different combinations of training data and inferred data.
  • Figure 4: An Ablation study on the effect of the diffusion strength of the training data on the performance on differently diffused test sets. For each value $t$ on the x-axis, a CNN is trained with configuration $\mathrm{EED}(\mathrm{City},P_{\mathit{mild}}, t)$. Each CNN is evaluated on the four different datasets.
  • Figure 5: The segmentation performance of $f_{\mathrm{EED}}(\mathrm{EED})$ in comparison to $f_{\mathrm{City}}(\mathrm{City})$ as a function of the visibility of segment boundaries. The smaller $\overline{ \| \nabla_x B_i(\mathrm{EED}) \|_2 }$, the less visible the segment boundary.
  • ...and 2 more figures