Table of Contents
Fetching ...

Evaluating the Explainability of Attributes and Prototypes for a Medical Classification Model

Luisa Gallée, Catharina Silvia Lisson, Christoph Gerhard Lisson, Daniela Drees, Felix Weig, Daniel Vogele, Meinrad Beer, Michael Götz

TL;DR

This paper investigates the explainability of a human-defined, attribute- and prototype-based AI model (Proto-Caps) for classifying pulmonary nodules in CT images. Using the LIDC-IDRI dataset, the authors deploy Proto-Caps with attribute scores and attribute-specific prototypes, and evaluate explanations through a user study with six radiologists. Results show explanations are subjectively helpful and align with radiologists’ decision criteria, but can inflate trust and confidence when the model errs, underscoring the need for careful presentation and further user-centered validation. The work highlights the potential of interpretable, multimodal explanations to support radiologists as a second opinion while signaling the necessity of iterative, case-specific tailoring and larger-scale studies to ensure safe deployment.

Abstract

Due to the sensitive nature of medicine, it is particularly important and highly demanded that AI methods are explainable. This need has been recognised and there is great research interest in xAI solutions with medical applications. However, there is a lack of user-centred evaluation regarding the actual impact of the explanations. We evaluate attribute- and prototype-based explanations with the Proto-Caps model. This xAI model reasons the target classification with human-defined visual features of the target object in the form of scores and attribute-specific prototypes. The model thus provides a multimodal explanation that is intuitively understandable to humans thanks to predefined attributes. A user study involving six radiologists shows that the explanations are subjectivly perceived as helpful, as they reflect their decision-making process. The results of the model are considered a second opinion that radiologists can discuss using the model's explanations. However, it was shown that the inclusion and increased magnitude of model explanations objectively can increase confidence in the model's predictions when the model is incorrect. We can conclude that attribute scores and visual prototypes enhance confidence in the model. However, additional development and repeated user studies are needed to tailor the explanation to the respective use case.

Evaluating the Explainability of Attributes and Prototypes for a Medical Classification Model

TL;DR

This paper investigates the explainability of a human-defined, attribute- and prototype-based AI model (Proto-Caps) for classifying pulmonary nodules in CT images. Using the LIDC-IDRI dataset, the authors deploy Proto-Caps with attribute scores and attribute-specific prototypes, and evaluate explanations through a user study with six radiologists. Results show explanations are subjectively helpful and align with radiologists’ decision criteria, but can inflate trust and confidence when the model errs, underscoring the need for careful presentation and further user-centered validation. The work highlights the potential of interpretable, multimodal explanations to support radiologists as a second opinion while signaling the necessity of iterative, case-specific tailoring and larger-scale studies to ensure safe deployment.

Abstract

Due to the sensitive nature of medicine, it is particularly important and highly demanded that AI methods are explainable. This need has been recognised and there is great research interest in xAI solutions with medical applications. However, there is a lack of user-centred evaluation regarding the actual impact of the explanations. We evaluate attribute- and prototype-based explanations with the Proto-Caps model. This xAI model reasons the target classification with human-defined visual features of the target object in the form of scores and attribute-specific prototypes. The model thus provides a multimodal explanation that is intuitively understandable to humans thanks to predefined attributes. A user study involving six radiologists shows that the explanations are subjectivly perceived as helpful, as they reflect their decision-making process. The results of the model are considered a second opinion that radiologists can discuss using the model's explanations. However, it was shown that the inclusion and increased magnitude of model explanations objectively can increase confidence in the model's predictions when the model is incorrect. We can conclude that attribute scores and visual prototypes enhance confidence in the model. However, additional development and repeated user studies are needed to tailor the explanation to the respective use case.
Paper Structure (20 sections, 4 figures)

This paper contains 20 sections, 4 figures.

Figures (4)

  • Figure 1: Proto-Caps architecture A capsule network produces encapsulated representations of visually descriptive features. The prototype layer iteratively constructs a set of prototypes, each covering a single attribute. During inference, the latent vectors of the closest attribute prototypes are concatenated for a dense layer to predict a target score. The concatenated vector is also fed into a decoder network to reconstruct the region of interest, which benefits the training gallee2023interpretable.
  • Figure 2: Test cases Sample of test case in questionnaire. The pulmonary sample is shown in a section of the lung and in a crop-out view. Depending on the variant, (A) only the malignancy prediction of the model is given, (B) additionally the model predicted scores of the attribute, or (C) additionally prototypical samples of the attribute.
  • Figure 3: Users' performance Analysis of the radiologists' performance during the test cases in Within-1-Accuracy gallee2023interpretable. The boxplots in green depict the diagnostic accuracy of the radiologists when the model prediction was correct, while the red boxplots show the accuracy when the model was incorrect.
  • Figure 4: Trust in model prediction The confidence scores in the model predictions during test cases are presented with respect to the model's correctness and level of explainability.