Batch-CAM: Introduction to better reasoning in convolutional deep learning models
Giacomo Ignesti, Davide Moroni, Massimo Martinelli
TL;DR
The paper addresses the opacity of deep learning models in computer vision and proposes Batch-CAM, a training paradigm that fuses a batch-wise Grad-CAM explanation with a prototypical reconstruction loss to steer learning toward evidence-relevant features. The method introduces two losses—Prototype Loss and Batch-CAM Prototype Loss—that enforce attention alignment with class prototypes, computed from data rather than annotations. Empirical results on MNIST and Fashion-MNIST across SimpleCNN, ResNet-18, and ConvNeXt-V2-Tiny show consistent gains in classification accuracy and reconstruction quality, while enabling more coherent saliency maps and efficient batch-wise Grad-CAM computation. This approach advances trustworthy AI by integrating explainability into the training objective, with potential for extension to more complex domains and richer prototype representations.
Abstract
Understanding the inner workings of deep learning models is crucial for advancing artificial intelligence, particularly in high-stakes fields such as healthcare, where accurate explanations are as vital as precision. This paper introduces Batch-CAM, a novel training paradigm that fuses a batch implementation of the Grad-CAM algorithm with a prototypical reconstruction loss. This combination guides the model to focus on salient image features, thereby enhancing its performance across classification tasks. Our results demonstrate that Batch-CAM achieves a simultaneous improvement in accuracy and image reconstruction quality while reducing training and inference times. By ensuring models learn from evidence-relevant information,this approach makes a relevant contribution to building more transparent, explainable, and trustworthy AI systems.
