Evaluating the Explainability of Attributes and Prototypes for a Medical Classification Model
Luisa Gallée, Catharina Silvia Lisson, Christoph Gerhard Lisson, Daniela Drees, Felix Weig, Daniel Vogele, Meinrad Beer, Michael Götz
TL;DR
This paper investigates the explainability of a human-defined, attribute- and prototype-based AI model (Proto-Caps) for classifying pulmonary nodules in CT images. Using the LIDC-IDRI dataset, the authors deploy Proto-Caps with attribute scores and attribute-specific prototypes, and evaluate explanations through a user study with six radiologists. Results show explanations are subjectively helpful and align with radiologists’ decision criteria, but can inflate trust and confidence when the model errs, underscoring the need for careful presentation and further user-centered validation. The work highlights the potential of interpretable, multimodal explanations to support radiologists as a second opinion while signaling the necessity of iterative, case-specific tailoring and larger-scale studies to ensure safe deployment.
Abstract
Due to the sensitive nature of medicine, it is particularly important and highly demanded that AI methods are explainable. This need has been recognised and there is great research interest in xAI solutions with medical applications. However, there is a lack of user-centred evaluation regarding the actual impact of the explanations. We evaluate attribute- and prototype-based explanations with the Proto-Caps model. This xAI model reasons the target classification with human-defined visual features of the target object in the form of scores and attribute-specific prototypes. The model thus provides a multimodal explanation that is intuitively understandable to humans thanks to predefined attributes. A user study involving six radiologists shows that the explanations are subjectivly perceived as helpful, as they reflect their decision-making process. The results of the model are considered a second opinion that radiologists can discuss using the model's explanations. However, it was shown that the inclusion and increased magnitude of model explanations objectively can increase confidence in the model's predictions when the model is incorrect. We can conclude that attribute scores and visual prototypes enhance confidence in the model. However, additional development and repeated user studies are needed to tailor the explanation to the respective use case.
