Table of Contents
Fetching ...

Evaluating the Stability of Semantic Concept Representations in CNNs for Robust Explainability

Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, Korinna Bade

TL;DR

This work tackles the stability of semantic concept representations in CNNs to enable robust post-hoc explainability in safety-critical CV tasks. It introduces a Stability Evaluation Framework that fuses supervised TCAV with unsupervised ICE mining to build a label-efficient concept pool and evaluate retrieval and attribution stability, including a formal stability metric $S_{L_k}^C(X)$. Through extensive experiments across six CNN backbones for object detection and classification, the authors find that 1D-CAVs offer the best overall stability, while 2D-CAVs suffer from low separability and 3D-CAVs from lower consistency, with gradient smoothing (SmoothGrad) mitigating attribution instability in shallow layers. The study provides actionable guidance on layer and CAV dimensionality selection, highlighting the practical impact for deploying CA in safety-critical XAI applications and laying groundwork for future comparisons with other global concept representations.

Abstract

Analysis of how semantic concepts are represented within Convolutional Neural Networks (CNNs) is a widely used approach in Explainable Artificial Intelligence (XAI) for interpreting CNNs. A motivation is the need for transparency in safety-critical AI-based systems, as mandated in various domains like automated driving. However, to use the concept representations for safety-relevant purposes, like inspection or error retrieval, these must be of high quality and, in particular, stable. This paper focuses on two stability goals when working with concept representations in computer vision CNNs: stability of concept retrieval and of concept attribution. The guiding use-case is a post-hoc explainability framework for object detection (OD) CNNs, towards which existing concept analysis (CA) methods are successfully adapted. To address concept retrieval stability, we propose a novel metric that considers both concept separation and consistency, and is agnostic to layer and concept representation dimensionality. We then investigate impacts of concept abstraction level, number of concept training samples, CNN size, and concept representation dimensionality on stability. For concept attribution stability we explore the effect of gradient instability on gradient-based explainability methods. The results on various CNNs for classification and object detection yield the main findings that (1) the stability of concept retrieval can be enhanced through dimensionality reduction via data aggregation, and (2) in shallow layers where gradient instability is more pronounced, gradient smoothing techniques are advised. Finally, our approach provides valuable insights into selecting the appropriate layer and concept representation dimensionality, paving the way towards CA in safety-critical XAI applications.

Evaluating the Stability of Semantic Concept Representations in CNNs for Robust Explainability

TL;DR

This work tackles the stability of semantic concept representations in CNNs to enable robust post-hoc explainability in safety-critical CV tasks. It introduces a Stability Evaluation Framework that fuses supervised TCAV with unsupervised ICE mining to build a label-efficient concept pool and evaluate retrieval and attribution stability, including a formal stability metric . Through extensive experiments across six CNN backbones for object detection and classification, the authors find that 1D-CAVs offer the best overall stability, while 2D-CAVs suffer from low separability and 3D-CAVs from lower consistency, with gradient smoothing (SmoothGrad) mitigating attribution instability in shallow layers. The study provides actionable guidance on layer and CAV dimensionality selection, highlighting the practical impact for deploying CA in safety-critical XAI applications and laying groundwork for future comparisons with other global concept representations.

Abstract

Analysis of how semantic concepts are represented within Convolutional Neural Networks (CNNs) is a widely used approach in Explainable Artificial Intelligence (XAI) for interpreting CNNs. A motivation is the need for transparency in safety-critical AI-based systems, as mandated in various domains like automated driving. However, to use the concept representations for safety-relevant purposes, like inspection or error retrieval, these must be of high quality and, in particular, stable. This paper focuses on two stability goals when working with concept representations in computer vision CNNs: stability of concept retrieval and of concept attribution. The guiding use-case is a post-hoc explainability framework for object detection (OD) CNNs, towards which existing concept analysis (CA) methods are successfully adapted. To address concept retrieval stability, we propose a novel metric that considers both concept separation and consistency, and is agnostic to layer and concept representation dimensionality. We then investigate impacts of concept abstraction level, number of concept training samples, CNN size, and concept representation dimensionality on stability. For concept attribution stability we explore the effect of gradient instability on gradient-based explainability methods. The results on various CNNs for classification and object detection yield the main findings that (1) the stability of concept retrieval can be enhanced through dimensionality reduction via data aggregation, and (2) in shallow layers where gradient instability is more pronounced, gradient smoothing techniques are advised. Finally, our approach provides valuable insights into selecting the appropriate layer and concept representation dimensionality, paving the way towards CA in safety-critical XAI applications.
Paper Structure (20 sections, 5 equations, 9 figures, 12 tables)

This paper contains 20 sections, 5 equations, 9 figures, 12 tables.

Figures (9)

  • Figure 1: The framework for estimation of CAV stability and concept attribution stability. The proposed solution utilizes unsupervised ICE to aid concept discovery and labeling, while supervised TCAV is used for the generation of concept representations.
  • Figure 2: Concept activation vectors (CAVs) of different dimensions.
  • Figure 3: Examples of synthetic concept samples generated using concept superpixels obtained from MS COCO.
  • Figure 4: Impact of number of concept samples on CAVs stability for YOLO5
  • Figure 5: Impact of number of concept samples on CAVs stability for RCNN
  • ...and 4 more figures