Measuring the Effect of Background on Classification and Feature Importance in Deep Learning for AV Perception
Anne Sielemann, Valentin Barner, Stefan Wolf, Masoud Roschani, Jens Ziehn, Juergen Beyerer
TL;DR
This study interrogates how background information influences classification and feature attributions in deep learning for traffic sign recognition using six carefully crafted synthetic datasets that independently vary background correlations and camera variation. By applying Kernel SHAP and Grad-CAM and introducing a pixel-ratio metric, it quantifies background emphasis and links it to training data properties and performance. The findings show that background correlation tends to increase background-feature importance, while camera variation has weaker effects, and shape diversity can elevate background attention; ConvNeXt architectures can even benefit from background cues when evaluated on in-domain data. Overall, the work demonstrates the value of synthetic data for objective XAI evaluation and provides publicly accessible datasets to advance research in explainable AI for AV perception.
Abstract
Common approaches to explainable AI (XAI) for deep learning focus on analyzing the importance of input features on the classification task in a given model: saliency methods like SHAP and GradCAM are used to measure the impact of spatial regions of the input image on the classification result. Combined with ground truth information about the location of the object in the input image (e.g., a binary mask), it is determined whether object pixels had a high impact on the classification result, or whether the classification focused on background pixels. The former is considered to be a sign of a healthy classifier, whereas the latter is assumed to suggest overfitting on spurious correlations. A major challenge, however, is that these intuitive interpretations are difficult to test quantitatively, and hence the output of such explanations lacks an explanation itself. One particular reason is that correlations in real-world data are difficult to avoid, and whether they are spurious or legitimate is debatable. Synthetic data in turn can facilitate to actively enable or disable correlations where desired but often lack a sufficient quantification of realism and stochastic properties. [...] Therefore, we systematically generate six synthetic datasets for the task of traffic sign recognition, which differ only in their degree of camera variation and background correlation [...] to quantify the isolated influence of background correlation, different levels of camera variation, and considered traffic sign shapes on the classification performance, as well as background feature importance. [...] Results include a quantification of when and how much background features gain importance to support the classification task based on changes in the training domain [...]. Download: synset.de/datasets/synset-signset-ger/background-effect
