Interpretable Network Visualizations: A Human-in-the-Loop Approach for Post-hoc Explainability of CNN-based Image Classification

Matteo Bianchi; Antonio De Santis; Andrea Tocchetti; Marco Brambilla

Interpretable Network Visualizations: A Human-in-the-Loop Approach for Post-hoc Explainability of CNN-based Image Classification

Matteo Bianchi, Antonio De Santis, Andrea Tocchetti, Marco Brambilla

TL;DR

The paper addresses opacity in CNN image classifiers by proposing Interpretable Network Visualization (INV), a post-hoc, human-in-the-loop framework. Local explanations are produced by clustering layer-wise feature maps and merging them into cluster maps with Grad-CAM-based weights, forming layer-wise saliency maps $A_{C_i}$ weighted by $w_{C_i}$ and linked to crowdsourced labels. Labels are gathered through the Deep Reveal crowdsourcing game and refined with Sentence-BERT-based NLP to enable global explanations via label aggregation across images, with TCAV used for validation. Empirical results show INV yields higher informativeness than Grad-CAM, LIME, and SHAP in human-subject evaluations, supporting the practical value of combining human concepts with post-hoc explanations for CNNs.

Abstract

Transparency and explainability in image classification are essential for establishing trust in machine learning models and detecting biases and errors. State-of-the-art explainability methods generate saliency maps to show where a specific class is identified, without providing a detailed explanation of the model's decision process. Striving to address such a need, we introduce a post-hoc method that explains the entire feature extraction process of a Convolutional Neural Network. These explanations include a layer-wise representation of the features the model extracts from the input. Such features are represented as saliency maps generated by clustering and merging similar feature maps, to which we associate a weight derived by generalizing Grad-CAM for the proposed methodology. To further enhance these explanations, we include a set of textual labels collected through a gamified crowdsourcing activity and processed using NLP techniques and Sentence-BERT. Finally, we show an approach to generate global explanations by aggregating labels across multiple images.

Interpretable Network Visualizations: A Human-in-the-Loop Approach for Post-hoc Explainability of CNN-based Image Classification

TL;DR

weighted by

and linked to crowdsourced labels. Labels are gathered through the Deep Reveal crowdsourcing game and refined with Sentence-BERT-based NLP to enable global explanations via label aggregation across images, with TCAV used for validation. Empirical results show INV yields higher informativeness than Grad-CAM, LIME, and SHAP in human-subject evaluations, supporting the practical value of combining human concepts with post-hoc explanations for CNNs.

Abstract

Paper Structure (14 sections, 3 equations, 4 figures, 2 tables)

This paper contains 14 sections, 3 equations, 4 figures, 2 tables.

Introduction
Background and Related Works
Human Knowledge and Explainable AI
Interpretable Network Visualizations
Feature Maps Analysis
Human Knowledge Collection
Label Analysis
Experiment Setup
Results and Discussion
INVs Evaluation
Comparative Analysis with Human Subjects
Towards Global Explanations
A Discussion on Deep Reveal
Conclusion and Future Works

Figures (4)

Figure 1: A pipeline showing the process of generating an INV. In the first step, feature maps and their weights are extracted from the CNN. These feature maps are clustered to generate cluster maps. Subsequently, labels are collected through crowdsourcing and processed using NLP techniques. Finally, cluster maps with the same top label are merged.
Figure 2: A pipeline describing the label collection process through Deep Reveal. The masked version of the cluster map is shown to users who can try to guess right away or increase the visible portion of the image. After guessing, users provide labels to explain their decision.
Figure 3: An INV for an image of the class "church". The most important feature for this prediction was "steeple". However, it can be observed that other elements, such as "cross" and "roof" also contributed to the classification.
Figure 4: An example of a global explanation of the class "tench" for the last convolutional layer. It provides a set of features described by a label and associated with a weight and a TCAV score.

Interpretable Network Visualizations: A Human-in-the-Loop Approach for Post-hoc Explainability of CNN-based Image Classification

TL;DR

Abstract

Interpretable Network Visualizations: A Human-in-the-Loop Approach for Post-hoc Explainability of CNN-based Image Classification

Authors

TL;DR

Abstract

Table of Contents

Figures (4)