Table of Contents
Fetching ...

On Background Bias of Post-Hoc Concept Embeddings in Computer Vision DNNs

Gesina Schwalbe, Georgii Mikriukov, Edgar Heinert, Stavros Gerolymatos, Mert Keser, Alois Knoll, Matthias Rottmann, Annika Mütze

TL;DR

This work interrogates whether post-hoc concept embeddings used in concept-based XAI for computer vision are biased by image backgrounds. It introduces three low-cost background-randomization techniques—image pasting with Places205 backgrounds, Voronoi-patched backgrounds, and diffusion-generated backgrounds—to test robustness and to train CE representations. Across >50 concepts, two data sets, and seven architectures, the study finds notable background biases (e.g., road scenes degrade animal concept segmentation) and shows that background-randomized training improves background robustness, with LoCE and its globalized variant (GloCE) often outperform baseline Net2Vec. The results demonstrate a practical workflow for bias discovery and mitigation in post-hoc C-XAI, suggesting that even cheap interventions can significantly enhance the reliability of explanations in safety-critical CV tasks.

Abstract

The thriving research field of concept-based explainable artificial intelligence (C-XAI) investigates how human-interpretable semantic concepts embed in the latent spaces of deep neural networks (DNNs). Post-hoc approaches therein use a set of examples to specify a concept, and determine its embeddings in DNN latent space using data driven techniques. This proved useful to uncover biases between different target (foreground or concept) classes. However, given that the background is mostly uncontrolled during training, an important question has been left unattended so far: Are/to what extent are state-of-the-art, data-driven post-hoc C-XAI approaches themselves prone to biases with respect to their backgrounds? E.g., wild animals mostly occur against vegetation backgrounds, and they seldom appear on roads. Even simple and robust C-XAI methods might abuse this shortcut for enhanced performance. A dangerous performance degradation of the concept-corner cases of animals on the road could thus remain undiscovered. This work validates and thoroughly confirms that established Net2Vec-based concept segmentation techniques frequently capture background biases, including alarming ones, such as underperformance on road scenes. For the analysis, we compare 3 established techniques from the domain of background randomization on >50 concepts from 2 datasets, and 7 diverse DNN architectures. Our results indicate that even low-cost setups can provide both valuable insight and improved background robustness.

On Background Bias of Post-Hoc Concept Embeddings in Computer Vision DNNs

TL;DR

This work interrogates whether post-hoc concept embeddings used in concept-based XAI for computer vision are biased by image backgrounds. It introduces three low-cost background-randomization techniques—image pasting with Places205 backgrounds, Voronoi-patched backgrounds, and diffusion-generated backgrounds—to test robustness and to train CE representations. Across >50 concepts, two data sets, and seven architectures, the study finds notable background biases (e.g., road scenes degrade animal concept segmentation) and shows that background-randomized training improves background robustness, with LoCE and its globalized variant (GloCE) often outperform baseline Net2Vec. The results demonstrate a practical workflow for bias discovery and mitigation in post-hoc C-XAI, suggesting that even cheap interventions can significantly enhance the reliability of explanations in safety-critical CV tasks.

Abstract

The thriving research field of concept-based explainable artificial intelligence (C-XAI) investigates how human-interpretable semantic concepts embed in the latent spaces of deep neural networks (DNNs). Post-hoc approaches therein use a set of examples to specify a concept, and determine its embeddings in DNN latent space using data driven techniques. This proved useful to uncover biases between different target (foreground or concept) classes. However, given that the background is mostly uncontrolled during training, an important question has been left unattended so far: Are/to what extent are state-of-the-art, data-driven post-hoc C-XAI approaches themselves prone to biases with respect to their backgrounds? E.g., wild animals mostly occur against vegetation backgrounds, and they seldom appear on roads. Even simple and robust C-XAI methods might abuse this shortcut for enhanced performance. A dangerous performance degradation of the concept-corner cases of animals on the road could thus remain undiscovered. This work validates and thoroughly confirms that established Net2Vec-based concept segmentation techniques frequently capture background biases, including alarming ones, such as underperformance on road scenes. For the analysis, we compare 3 established techniques from the domain of background randomization on >50 concepts from 2 datasets, and 7 diverse DNN architectures. Our results indicate that even low-cost setups can provide both valuable insight and improved background robustness.

Paper Structure

This paper contains 35 sections, 2 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Illustration of training and inference of global and local concept embeddings (here for concept cowboy smiley): A CE is a linear classifier on activation map pixels, represented by its normal vector $v$ in latent space. To infer the model on a new input image, (1) the activations for the desired layer $L$ are collected, (2) each pixel is classified using dot product with the CE, which yields a heatmap of concept presence as CE prediction. For training, (3) this is compared against a ground truth concept mask to update the CE weights via gradient descent.
  • Figure 2: Exemplary results for Net2Vec CE and GloCEs and $4$ Pascal VOC concepts. For each concept, the heatmaps resulting from inference of a CE trained on vanilla data and one trained with simple Places205 background randomization (rand.) are shown side-by-side, each for the vanilla original and a Voronoi randomized version of a (randomly chosen) test sample. Ground truth masks (2nd column) and predicted heatmaps (columns 4--7) are shown via overlays: dark/blue means 0, no darkening/red means 1. \ref{['fig:voronoi']} visualizes all three considered background randomization techniques.
  • Figure 4: Pairwise cosine similarities between CEs of the same concept and layer but from different train data randomization schemes. Cosine similarities are calculated for matching pairs of CEs for the same layer (late layers only) and concept, then averaged. Bars indicate left to right averaged similarity to CEs trained on Places205 (blue), Voronoi (orange), synthetic (green), vanilla (red) data. Standard deviation is indicated via error bars (black). For visual reference, cosine similarities of CEs with themselves are reported (value of 1).
  • Figure 5: IoU on bg-randomized test images (simple bg pasting from Places validation dataset) versus non-randomized ones. Compared are different training schemes: Pasting the test image foregrounds onto single unchanged or Voronoi backgrounds from the Places training data, onto synthetic backgrounds, or keeping the original backgrounds (vanilla). Results are shown for Net2Vec (left) and GloCE (right) CEs trained for concepts from Pascal VOC (top) and ImageNetS50 (bottom), averaged over all concepts and 7 DNNs.