Interpretable Network Visualizations: A Human-in-the-Loop Approach for Post-hoc Explainability of CNN-based Image Classification
Matteo Bianchi, Antonio De Santis, Andrea Tocchetti, Marco Brambilla
TL;DR
The paper addresses opacity in CNN image classifiers by proposing Interpretable Network Visualization (INV), a post-hoc, human-in-the-loop framework. Local explanations are produced by clustering layer-wise feature maps and merging them into cluster maps with Grad-CAM-based weights, forming layer-wise saliency maps $A_{C_i}$ weighted by $w_{C_i}$ and linked to crowdsourced labels. Labels are gathered through the Deep Reveal crowdsourcing game and refined with Sentence-BERT-based NLP to enable global explanations via label aggregation across images, with TCAV used for validation. Empirical results show INV yields higher informativeness than Grad-CAM, LIME, and SHAP in human-subject evaluations, supporting the practical value of combining human concepts with post-hoc explanations for CNNs.
Abstract
Transparency and explainability in image classification are essential for establishing trust in machine learning models and detecting biases and errors. State-of-the-art explainability methods generate saliency maps to show where a specific class is identified, without providing a detailed explanation of the model's decision process. Striving to address such a need, we introduce a post-hoc method that explains the entire feature extraction process of a Convolutional Neural Network. These explanations include a layer-wise representation of the features the model extracts from the input. Such features are represented as saliency maps generated by clustering and merging similar feature maps, to which we associate a weight derived by generalizing Grad-CAM for the proposed methodology. To further enhance these explanations, we include a set of textual labels collected through a gamified crowdsourcing activity and processed using NLP techniques and Sentence-BERT. Finally, we show an approach to generate global explanations by aggregating labels across multiple images.
