XAI for Skin Cancer Detection with Prototypes and Non-Expert Supervision

Miguel Correia; Alceu Bissoto; Carlos Santiago; Catarina Barata

XAI for Skin Cancer Detection with Prototypes and Non-Expert Supervision

Miguel Correia, Alceu Bissoto, Carlos Santiago, Catarina Barata

TL;DR

This work tackles the interpretability gap in melanoma/nevus classification from dermoscopy images by extending ProtoPNet with non-expert supervision via binary lesion masks and remembered prototypes. The proposed architecture fuses a CNN backbone, a prototype layer, and a final classifier, using prototype-specific activations and top-$k$ pooling to generate transparent decisions. Empirical results on ISIC 2019 and generalization tests on PH^2 and Derm7pt show that non-expert supervision improves prototype quality and can match or exceed non-interpretable baselines, with $L_P$+$L_M$ frequently delivering the best performance. The findings highlight the potential of non-expert input to enhance clinically relevant prototypes and suggest future work incorporating expert feedback to further validate clinical impact.

Abstract

Skin cancer detection through dermoscopy image analysis is a critical task. However, existing models used for this purpose often lack interpretability and reliability, raising the concern of physicians due to their black-box nature. In this paper, we propose a novel approach for the diagnosis of melanoma using an interpretable prototypical-part model. We introduce a guided supervision based on non-expert feedback through the incorporation of: 1) binary masks, obtained automatically using a segmentation network; and 2) user-refined prototypes. These two distinct information pathways aim to ensure that the learned prototypes correspond to relevant areas within the skin lesion, excluding confounding factors beyond its boundaries. Experimental results demonstrate that, even without expert supervision, our approach achieves superior performance and generalization compared to non-interpretable models.

XAI for Skin Cancer Detection with Prototypes and Non-Expert Supervision

TL;DR

pooling to generate transparent decisions. Empirical results on ISIC 2019 and generalization tests on PH^2 and Derm7pt show that non-expert supervision improves prototype quality and can match or exceed non-interpretable baselines, with

frequently delivering the best performance. The findings highlight the potential of non-expert input to enhance clinically relevant prototypes and suggest future work incorporating expert feedback to further validate clinical impact.

Abstract

Paper Structure (9 sections, 7 equations, 5 figures, 2 tables)

This paper contains 9 sections, 7 equations, 5 figures, 2 tables.

Introduction
Proposed Approach
Model Architecture
Prototype Learning
Non-expert Supervision of Prototypes
Experimental Results
Experimental Setup
Results and Analysis
Conclusions

Figures (5)

Figure 1: Interpretable model for skin cancer classification of Melanoma vs Nevus. The model is based on ProtoPNet Chen2018 and enables non-expert supervision of prototypes through the use of a binary mask or human feedback.
Figure 2: Melanoma prototypes obtained using EfficientNet B3 CNN backbone with the best-performing models on ISIC 2019 validation set, for different approaches: $L_\text{P}$ + $L_{\text{M}}$ (1st line), $L_\text{P}$ + $L_\text{R}$ (2nd line), and $L_\text{P}$ (3rd line). In $L_P$, prototypes are often found near black edges, corners, instead of within the lesion, unlike the other two approaches
Figure 3: Explanation of $L_\text{P}$ + $L_\text{M}$ approach for a test case from the $\text{PH}^2$ dataset with the top 3 activated prototypes in ResNet-50 backbone. The first prototype is melanoma, while the other two are nevus. The test image is melanoma and is correctly classified. The explanation is based on the similarity between parts of the test image and the prototypes.
Figure S1: Explanation of $L_\text{P}$ + $L_\text{M}$ approach for a validation case from the ISIC 2019 dataset with the top 3 activated prototypes in Densenet-169 backbone. The first prototype is nevus, while the other two are melanoma. The test image is from the nevus class and is correctly classified. We are only displaying 3 prototypes instead of the 18 prototypes, as a means of simplification. It's crucial to state that the total points assigned to the predicted class, across all its prototypes, exceed those of the other class. The substantial point difference between the top two prototypes is notable. The closest resembling prototype heavily influences the decision.
Figure S2: Explanation of $L_\text{P}$ + $L_\text{M}$ approach for a validation case from the ISIC 2019 dataset with the top 3 activated prototypes in Densenet-169 backbone. The first prototype is melanoma, while the other two are nevus. The test image is from the melanoma class and is correctly classified.

XAI for Skin Cancer Detection with Prototypes and Non-Expert Supervision

TL;DR

Abstract

XAI for Skin Cancer Detection with Prototypes and Non-Expert Supervision

Authors

TL;DR

Abstract

Table of Contents

Figures (5)