Table of Contents
Fetching ...

MICA: Towards Explainable Skin Lesion Diagnosis via Multi-Level Image-Concept Alignment

Yequan Bie, Luyang Luo, Hao Chen

TL;DR

MICA addresses the need for explainable skin lesion diagnosis by integrating multi-level image-concept alignment with a concept bottleneck. It combines a CNN image encoder and an LLM-based concept encoder to align global, regional, and subspace concept representations, enabling both visual and textual explanations. The method demonstrates improved disease-diagnosis performance and robust concept detection across three skin datasets, while maintaining interpretability and offering test-time concept interventions for faithful explanations. This approach advances practical, ante-hoc XAI in dermatology, with potential to improve clinician trust and diagnostic efficiency through human-interpretable reasoning and adjustable, concept-driven decisions.

Abstract

Black-box deep learning approaches have showcased significant potential in the realm of medical image analysis. However, the stringent trustworthiness requirements intrinsic to the medical field have catalyzed research into the utilization of Explainable Artificial Intelligence (XAI), with a particular focus on concept-based methods. Existing concept-based methods predominantly apply concept annotations from a single perspective (e.g., global level), neglecting the nuanced semantic relationships between sub-regions and concepts embedded within medical images. This leads to underutilization of the valuable medical information and may cause models to fall short in harmoniously balancing interpretability and performance when employing inherently interpretable architectures such as Concept Bottlenecks. To mitigate these shortcomings, we propose a multi-modal explainable disease diagnosis framework that meticulously aligns medical images and clinical-related concepts semantically at multiple strata, encompassing the image level, token level, and concept level. Moreover, our method allows for model intervention and offers both textual and visual explanations in terms of human-interpretable concepts. Experimental results on three skin image datasets demonstrate that our method, while preserving model interpretability, attains high performance and label efficiency for concept detection and disease diagnosis.

MICA: Towards Explainable Skin Lesion Diagnosis via Multi-Level Image-Concept Alignment

TL;DR

MICA addresses the need for explainable skin lesion diagnosis by integrating multi-level image-concept alignment with a concept bottleneck. It combines a CNN image encoder and an LLM-based concept encoder to align global, regional, and subspace concept representations, enabling both visual and textual explanations. The method demonstrates improved disease-diagnosis performance and robust concept detection across three skin datasets, while maintaining interpretability and offering test-time concept interventions for faithful explanations. This approach advances practical, ante-hoc XAI in dermatology, with potential to improve clinician trust and diagnostic efficiency through human-interpretable reasoning and adjustable, concept-driven decisions.

Abstract

Black-box deep learning approaches have showcased significant potential in the realm of medical image analysis. However, the stringent trustworthiness requirements intrinsic to the medical field have catalyzed research into the utilization of Explainable Artificial Intelligence (XAI), with a particular focus on concept-based methods. Existing concept-based methods predominantly apply concept annotations from a single perspective (e.g., global level), neglecting the nuanced semantic relationships between sub-regions and concepts embedded within medical images. This leads to underutilization of the valuable medical information and may cause models to fall short in harmoniously balancing interpretability and performance when employing inherently interpretable architectures such as Concept Bottlenecks. To mitigate these shortcomings, we propose a multi-modal explainable disease diagnosis framework that meticulously aligns medical images and clinical-related concepts semantically at multiple strata, encompassing the image level, token level, and concept level. Moreover, our method allows for model intervention and offers both textual and visual explanations in terms of human-interpretable concepts. Experimental results on three skin image datasets demonstrate that our method, while preserving model interpretability, attains high performance and label efficiency for concept detection and disease diagnosis.
Paper Structure (26 sections, 9 equations, 3 figures, 5 tables)

This paper contains 26 sections, 9 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Our method learns image and concept semantic correspondences at the image, token, and concept levels.
  • Figure 2: The overall pipeline of our proposed framework.
  • Figure 3: Illustration of our model's faithfulness, plausibility and understandability. (a)(b) Test time concept-intervention examples and results. (c) Examples of visual and textual explanations provided by our method given skin images from different datasets. Correct prediction results are marked in green, while red highlights incorrect predictions.