Feature CAM: Interpretable AI in Image Classification
Frincy Clement, Ji Yang, Irene Cheng
TL;DR
The paper addresses the interpretability gap in image classification by introducing Feature CAM, a perturbation-activation fusion that augments Activation-Based Methods with edge-based feature descriptors from Holistically Nested Edge Detection. By generating three input variants and leveraging Grad-CAM saliency maps, Feature CAM yields fine-grained, class-discriminative visualizations that enhance human interpretability while preserving machine confidence across multiple CNN architectures. Empirical results show a 3–4× improvement in human interpretability over Grad-CAM-based baselines, with robust machine interpretability maintained, especially for lighter models. A key limitation is the dependence on Grad-CAM for localization, and future work aims to establish a stronger, model-agnostic localization baseline to broaden applicability.
Abstract
Deep Neural Networks have often been called the black box because of the complex, deep architecture and non-transparency presented by the inner layers. There is a lack of trust to use Artificial Intelligence in critical and high-precision fields such as security, finance, health, and manufacturing industries. A lot of focused work has been done to provide interpretable models, intending to deliver meaningful insights into the thoughts and behavior of neural networks. In our research, we compare the state-of-the-art methods in the Activation-based methods (ABM) for interpreting predictions of CNN models, specifically in the application of Image Classification. We then extend the same for eight CNN-based architectures to compare the differences in visualization and thus interpretability. We introduced a novel technique Feature CAM, which falls in the perturbation-activation combination, to create fine-grained, class-discriminative visualizations. The resulting saliency maps from our experiments proved to be 3-4 times better human interpretable than the state-of-the-art in ABM. At the same time it reserves machine interpretability, which is the average confidence scores in classification.
