Q-SENN: Quantized Self-Explaining Neural Networks

Thomas Norrenbrock; Marco Rudolph; Bodo Rosenhahn

Q-SENN: Quantized Self-Explaining Neural Networks

Thomas Norrenbrock, Marco Rudolph, Bodo Rosenhahn

TL;DR

The paper tackles the lack of faithful explanations in deep networks by extending Self-Explaining Neural Networks (SENN) with Quantized-SENN (Q-SENN), a sparse, ternary final layer that uses an average of $5$ interpretable features per class. It iteratively optimizes a quantized weight matrix and enforces a binary-like relationship between features and classes, improving Fidelity, Diversity, and Grounding while maintaining accuracy. A CLIP-based alignment method without additional annotations maps learned concepts to human concepts, enabling verbalizable explanations. Overall, Q-SENN achieves higher interpretability without sacrificing performance and demonstrates robustness to spurious correlations across several vision benchmarks, with practical impact for safe, explainable AI in complex datasets.

Abstract

Explanations in Computer Vision are often desired, but most Deep Neural Networks can only provide saliency maps with questionable faithfulness. Self-Explaining Neural Networks (SENN) extract interpretable concepts with fidelity, diversity, and grounding to combine them linearly for decision-making. While they can explain what was recognized, initial realizations lack accuracy and general applicability. We propose the Quantized-Self-Explaining Neural Network Q-SENN. Q-SENN satisfies or exceeds the desiderata of SENN while being applicable to more complex datasets and maintaining most or all of the accuracy of an uninterpretable baseline model, out-performing previous work in all considered metrics. Q-SENN describes the relationship between every class and feature as either positive, negative or neutral instead of an arbitrary number of possible relations, enforcing more binary human-friendly features. Since every class is assigned just 5 interpretable features on average, Q-SENN shows convincing local and global interpretability. Additionally, we propose a feature alignment method, capable of aligning learned features with human language-based concepts without additional supervision. Thus, what is learned can be more easily verbalized. The code is published: https://github.com/ThomasNorr/Q-SENN

Q-SENN: Quantized Self-Explaining Neural Networks

TL;DR

interpretable features per class. It iteratively optimizes a quantized weight matrix and enforces a binary-like relationship between features and classes, improving Fidelity, Diversity, and Grounding while maintaining accuracy. A CLIP-based alignment method without additional annotations maps learned concepts to human concepts, enabling verbalizable explanations. Overall, Q-SENN achieves higher interpretability without sacrificing performance and demonstrates robustness to spurious correlations across several vision benchmarks, with practical impact for safe, explainable AI in complex datasets.

Abstract

Paper Structure (31 sections, 14 equations, 37 figures, 15 tables)

This paper contains 31 sections, 14 equations, 37 figures, 15 tables.

Introduction
Related Work
SLDD-Model
Method
Quantization
Experiments
Fidelity
Diversity
Grounding
Interpretability Tradeoff
Alignment Without Annotations
CLIP
Feature Alignment
Validation
Optimal Static Baseline
...and 16 more sections

Figures (37)

Figure 1: optimizes for Diversity, Grounding and Fidelity: The global explanation shows one example class being recognized through $5$ interpretable features that show high Diversity and Grounding, consistently localizing on the same meaningful human attributes across images, e. g., belly, crown, upper tail, upper wing and eye. When measuring Fidelity, the features generalize to unseen data and the local explanation fits the class explanation. Visualization techniques are based on overlaying color-coded feature maps, described in the supplementary material.
Figure 2: Overview of our proposed pipeline to construct a .
Figure 3: Exemplary result (right) of quantization on cumulative distribution (left) of nonzero weights in $^{\mathrm{sp}}$ for cubheader($=1000)$: Weights are set to $0$ or $\pm\alpha$. $\alpha$ is the average of all remaining values above $0$.
Figure 4: Exemplary local explanations in comparison: offers explanations based on interpretable features.
Figure 5: Relationship between Accuracy and interpretability-related parameters for with .
...and 32 more figures

Q-SENN: Quantized Self-Explaining Neural Networks

TL;DR

Abstract

Q-SENN: Quantized Self-Explaining Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (37)