Are Explanations Helpful? A Comparative Analysis of Explainability Methods in Skin Lesion Classifiers
Rosa Y. G. Paccotacya-Yanque, Alceu Bissoto, Sandra Avila
TL;DR
The paper tackles the challenge of explaining deep skin-lesion classifiers by comparing seven post-hoc explainability methods (four pixel-attribution: Grad-CAM, Score-CAM, LIME, SHAP; three concept-based: ACE, ICE, CME) on an Inception-v4 model trained with ISIC 2018 Task 3 data, achieving $89.96\% \pm 0.52$ ROC AUC. It formalizes three desiderata for explanations—fidelity, meaningfulness, and effectiveness—and evaluates how well each method meets them. Findings indicate pixel-attribution methods reveal biases and spurious correlations but often lack sufficient justification for predictions, while concept-based methods can provide higher-level but variable interpretability and fidelity (e.g., ICE $11.83\%$ relative error; CME $0.88$ ROC AUC). The study suggests that no single explainability approach suffices; a combined, clinician-informed strategy is more promising for trustworthy deployment, with future work including physician-perception studies and broader datasets and architectures.
Abstract
Deep Learning has shown outstanding results in computer vision tasks; healthcare is no exception. However, there is no straightforward way to expose the decision-making process of DL models. Good accuracy is not enough for skin cancer predictions. Understanding the model's behavior is crucial for clinical application and reliable outcomes. In this work, we identify desiderata for explanations in skin-lesion models. We analyzed seven methods, four based on pixel-attribution (Grad-CAM, Score-CAM, LIME, SHAP) and three on high-level concepts (ACE, ICE, CME), for a deep neural network trained on the International Skin Imaging Collaboration Archive. Our findings indicate that while these techniques reveal biases, there is room for improving the comprehensiveness of explanations to achieve transparency in skin-lesion models.
