Table of Contents
Fetching ...

Aligning Human Knowledge with Visual Concepts Towards Explainable Medical Image Classification

Yunhe Gao, Difei Gu, Mu Zhou, Dimitris Metaxas

TL;DR

This study investigates the explainable model development that can mimic the decision-making process of human experts by fusing the domain knowledge of explicit diagnostic criteria by introducing Explicd, a simple yet effective framework towards Explainable language-informed criteria-based diagnosis.

Abstract

Although explainability is essential in the clinical diagnosis, most deep learning models still function as black boxes without elucidating their decision-making process. In this study, we investigate the explainable model development that can mimic the decision-making process of human experts by fusing the domain knowledge of explicit diagnostic criteria. We introduce a simple yet effective framework, Explicd, towards Explainable language-informed criteria-based diagnosis. Explicd initiates its process by querying domain knowledge from either large language models (LLMs) or human experts to establish diagnostic criteria across various concept axes (e.g., color, shape, texture, or specific patterns of diseases). By leveraging a pretrained vision-language model, Explicd injects these criteria into the embedding space as knowledge anchors, thereby facilitating the learning of corresponding visual concepts within medical images. The final diagnostic outcome is determined based on the similarity scores between the encoded visual concepts and the textual criteria embeddings. Through extensive evaluation of five medical image classification benchmarks, Explicd has demonstrated its inherent explainability and extends to improve classification performance compared to traditional black-box models. Code is available at \url{https://github.com/yhygao/Explicd}.

Aligning Human Knowledge with Visual Concepts Towards Explainable Medical Image Classification

TL;DR

This study investigates the explainable model development that can mimic the decision-making process of human experts by fusing the domain knowledge of explicit diagnostic criteria by introducing Explicd, a simple yet effective framework towards Explainable language-informed criteria-based diagnosis.

Abstract

Although explainability is essential in the clinical diagnosis, most deep learning models still function as black boxes without elucidating their decision-making process. In this study, we investigate the explainable model development that can mimic the decision-making process of human experts by fusing the domain knowledge of explicit diagnostic criteria. We introduce a simple yet effective framework, Explicd, towards Explainable language-informed criteria-based diagnosis. Explicd initiates its process by querying domain knowledge from either large language models (LLMs) or human experts to establish diagnostic criteria across various concept axes (e.g., color, shape, texture, or specific patterns of diseases). By leveraging a pretrained vision-language model, Explicd injects these criteria into the embedding space as knowledge anchors, thereby facilitating the learning of corresponding visual concepts within medical images. The final diagnostic outcome is determined based on the similarity scores between the encoded visual concepts and the textual criteria embeddings. Through extensive evaluation of five medical image classification benchmarks, Explicd has demonstrated its inherent explainability and extends to improve classification performance compared to traditional black-box models. Code is available at \url{https://github.com/yhygao/Explicd}.
Paper Structure (13 sections, 4 equations, 2 figures, 1 table)

This paper contains 13 sections, 4 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: (a) Current state-of-the-art deep learning models often function as black boxes, offering predictions without revealing insights into their decision-making processes. (b) Depiction of the decision-making process by human experts for skin lesions, grounded in domain-specific knowledge and precise criteria, facilitates explainable diagnoses. (c) Overview of our Explicd framework: Domain knowledge is queried from LLMs or human experts across criteria axes. Explicd then aligns encoded visual concepts with textual knowledge anchors, facilitating the learning of visual concepts. The final diagnostic prediction is made based on the alignment scores between visual and textual concepts with a linear function.
  • Figure 2: (a) Alignment scores measured using cosine similarity between the encoded visual concept tokens and diagnostic criteria along each axis for skin lesion classification. The width of the lines represents the strength of similarity, with wider lines indicating higher scores. (b) Heatmap visualization of the encoded visual concept tokens overlaid on the image feature maps for a case of cardiomegaly. Brighter regions indicate higher similarity scores, suggesting a stronger focus on these areas by the model.