Reducing Texture Bias of Deep Neural Networks via Edge Enhancing Diffusion
Edgar Heinert, Matthias Rottmann, Kira Maag, Karsten Kahl
TL;DR
This work investigates texture bias in deep networks for semantic segmentation by using Edge Enhancing Diffusion (EED) to create texture-reduced image duplicates of Cityscapes and CARLA. It compares CNNs and vision transformers, showing CNNs are highly texture-biased while transformers are more robust to texture, and demonstrates that training on EED data yields models that largely ignore texture with manageable performance loss. The authors conduct classification and segmentation experiments, ablations on diffusion strength, segment-level analyses, and adversarial robustness tests, illustrating EED's value as both a diagnostic tool and a training-time regularizer. They also release a GPU-accelerated EED implementation to facilitate broader evaluation of texture bias in vision models.
Abstract
Convolutional neural networks (CNNs) for image processing tend to focus on localized texture patterns, commonly referred to as texture bias. While most of the previous works in the literature focus on the task of image classification, we go beyond this and study the texture bias of CNNs in semantic segmentation. In this work, we propose to train CNNs on pre-processed images with less texture to reduce the texture bias. Therein, the challenge is to suppress image texture while preserving shape information. To this end, we utilize edge enhancing diffusion (EED), an anisotropic image diffusion method initially introduced for image compression, to create texture reduced duplicates of existing datasets. Extensive numerical studies are performed with both CNNs and vision transformer models trained on original data and EED-processed data from the Cityscapes dataset and the CARLA driving simulator. We observe strong texture-dependence of CNNs and moderate texture-dependence of transformers. Training CNNs on EED-processed images enables the models to become completely ignorant with respect to texture, demonstrating resilience with respect to texture re-introduction to any degree. Additionally we analyze the performance reduction in depth on a level of connected components in the semantic segmentation and study the influence of EED pre-processing on domain generalization as well as adversarial robustness.
