Table of Contents
Fetching ...

TraNCE: Transformative Non-linear Concept Explainer for CNNs

Ugochukwu Ejike Akpudo, Yongsheng Gao, Jun Zhou, Andrew Lewis

TL;DR

TraNCE tackles CNN explainability by automatically discovering non-linear concepts from intermediate activations using a VAE-based reducer, paired with a Bessel-function visualization and a Faith score that jointly accounts for Coherence and Fidelity. It demonstrates superior global faithfulness and meaningful concept prototypes on FGVC tasks, outperforming several baselines and offering robust local and global explanations. The method emphasizes human-friendly interpretation, transferability of trained explainers, and resilience to certain image transformations, while acknowledging limitations in high inter-class similarity and computational costs. Overall, TraNCE advances quantitative, human-centric XAI for CNNs and opens avenues for extensions to video data and transformer architectures.

Abstract

Convolutional neural networks (CNNs) have succeeded remarkably in various computer vision tasks. However, they are not intrinsically explainable. While the feature-level understanding of CNNs reveals where the models looked, concept-based explainability methods provide insights into what the models saw. However, their assumption of linear reconstructability of image activations fails to capture the intricate relationships within these activations. Their Fidelity-only approach to evaluating global explanations also presents a new concern. For the first time, we address these limitations with the novel Transformative Nonlinear Concept Explainer (TraNCE) for CNNs. Unlike linear reconstruction assumptions made by existing methods, TraNCE captures the intricate relationships within the activations. This study presents three original contributions to the CNN explainability literature: (i) An automatic concept discovery mechanism based on variational autoencoders (VAEs). This transformative concept discovery process enhances the identification of meaningful concepts from image activations. (ii) A visualization module that leverages the Bessel function to create a smooth transition between prototypical image pixels, revealing not only what the CNN saw but also what the CNN avoided, thereby mitigating the challenges of concept duplication as documented in previous works. (iii) A new metric, the Faith score, integrates both Coherence and Fidelity for a comprehensive evaluation of explainer faithfulness and consistency.

TraNCE: Transformative Non-linear Concept Explainer for CNNs

TL;DR

TraNCE tackles CNN explainability by automatically discovering non-linear concepts from intermediate activations using a VAE-based reducer, paired with a Bessel-function visualization and a Faith score that jointly accounts for Coherence and Fidelity. It demonstrates superior global faithfulness and meaningful concept prototypes on FGVC tasks, outperforming several baselines and offering robust local and global explanations. The method emphasizes human-friendly interpretation, transferability of trained explainers, and resilience to certain image transformations, while acknowledging limitations in high inter-class similarity and computational costs. Overall, TraNCE advances quantitative, human-centric XAI for CNNs and opens avenues for extensions to video data and transformer architectures.

Abstract

Convolutional neural networks (CNNs) have succeeded remarkably in various computer vision tasks. However, they are not intrinsically explainable. While the feature-level understanding of CNNs reveals where the models looked, concept-based explainability methods provide insights into what the models saw. However, their assumption of linear reconstructability of image activations fails to capture the intricate relationships within these activations. Their Fidelity-only approach to evaluating global explanations also presents a new concern. For the first time, we address these limitations with the novel Transformative Nonlinear Concept Explainer (TraNCE) for CNNs. Unlike linear reconstruction assumptions made by existing methods, TraNCE captures the intricate relationships within the activations. This study presents three original contributions to the CNN explainability literature: (i) An automatic concept discovery mechanism based on variational autoencoders (VAEs). This transformative concept discovery process enhances the identification of meaningful concepts from image activations. (ii) A visualization module that leverages the Bessel function to create a smooth transition between prototypical image pixels, revealing not only what the CNN saw but also what the CNN avoided, thereby mitigating the challenges of concept duplication as documented in previous works. (iii) A new metric, the Faith score, integrates both Coherence and Fidelity for a comprehensive evaluation of explainer faithfulness and consistency.

Paper Structure

This paper contains 24 sections, 3 theorems, 8 equations, 12 figures, 4 tables, 1 algorithm.

Key Result

Proposition 1

A trained VAE $\hat{f}_{l_{i}(\theta, \phi)_{min}}(x, \hat{x})$ consists of a reducer $\hat{f}_{\theta_{min}}(x, z)$ and a reconstruction model $\hat{f}_{\phi_{min}}(z, \hat{x})$. The fixed weight matrix $\mathcal{W}_l \in \mathbb{R}^{c^\prime \times c}$ and the embedding $z \in \mathbb{R}^{(h \cdot

Figures (12)

  • Figure 1: The rationale for non-linear reconstructability of image activations produced by a CNN in the latent space. An orthogonal linear decomposition may produce invalid latent projections (× mark) while the non-orthogonal linear decomposition may produce acceptable latent projections as non-negative projections (✓ mark). In contrast, a non-linear decomposition produces valid non-negative latent projections (✓✓ mark).
  • Figure 2: Local explanations for (a) ResNet50, tested on an image of an Australian Kelpie and a Chihuahua, and (b) InceptionV3, tested on an image of a Macaw and an Eagle. TraNCE produces prototypes for each concept by automatically optimizing image regions from the training images that faithfully represent a concept in the target class. The total Contribution for each target class is the sum of the product of Weights and the Similarity scores.
  • Figure 3: Oscillation of Bessel function of first kind of integer order at $J_0(x), J_1(x)$, and $J_2(x)$.
  • Figure 4: The proposed TraNCE explainability framework. The CNN $f(\cdot)$ comprises Conv-ReLU layers $E(\cdot)$, generating high-dimensional activations $\mathcal{A}_{l}$ at layer $l$, and a classifier $C(\cdot)$ for predictions. The VAE-based explainer $\hat{f}(\cdot)$ includes an encoder and decoder, producing CAVs $\mathcal{W}$ and embedding $z$ from $\mathcal{A}_{l}$ at the bottleneck. Concepts are derived from embedding and visualized using the Bessel function, with prototypes sourced from the training set via MMD-Critic. Each concept explanation uses $\mathcal{W}$ to compute local checkpoints: concept Similarity, concept weights $\zeta$, and concept Contribution. Global explanations are provided by $C(\cdot)$, evaluating prediction losses between $\mathcal{A}_{l}$ and $\mathcal{A}^\prime_{l}$ as Faith scores.
  • Figure 5: Train/validation history of the proposed TraNCE explainer over 100 iterations for (a) ResNet50's explanation using an image of an Australian Kelpie and a Chihuahua, and (b) InceptionV3's explanation using an image of a Macaw and an Eagle.
  • ...and 7 more figures

Theorems & Definitions (5)

  • Definition 1
  • Proposition 1
  • Definition 2
  • Proposition 2
  • Proposition 3