Table of Contents
Fetching ...

A Concept-based Interpretable Model for the Diagnosis of Choroid Neoplasias using Multimodal Data

Yifan Wu, Yang Liu, Yue Yang, Michael S. Yao, Wenli Yang, Xuehui Shi, Lihong Yang, Dongjun Li, Yueming Liu, James C. Gee, Xuan Yang, Wenbin Wei, Shi Gu

TL;DR

A multimodal concept-based interpretable model (MMCBM) is developed to distinguish uveal melanoma from hemangioma and metastatic carcinoma, obtaining performance comparable to senior ophthalmologists in a large cohort of Asian patients with choroid neoplasms.

Abstract

Diagnosing rare diseases presents a common challenge in clinical practice, necessitating the expertise of specialists for accurate identification. The advent of machine learning offers a promising solution, while the development of such technologies is hindered by the scarcity of data on rare conditions and the demand for models that are both interpretable and trustworthy in a clinical context. Interpretable AI, with its capacity for human-readable outputs, can facilitate validation by clinicians and contribute to medical education. In the current work, we focus on choroid neoplasias, the most prevalent form of eye cancer in adults, albeit rare with 5.1 per million. We built the so-far largest dataset consisting of 750 patients, incorporating three distinct imaging modalities collected from 2004 to 2022. Our work introduces a concept-based interpretable model that distinguishes between three types of choroidal tumors, integrating insights from domain experts via radiological reports. Remarkably, this model not only achieves an F1 score of 0.91, rivaling that of black-box models, but also boosts the diagnostic accuracy of junior doctors by 42%. This study highlights the significant potential of interpretable machine learning in improving the diagnosis of rare diseases, laying a groundwork for future breakthroughs in medical AI that could tackle a wider array of complex health scenarios.

A Concept-based Interpretable Model for the Diagnosis of Choroid Neoplasias using Multimodal Data

TL;DR

A multimodal concept-based interpretable model (MMCBM) is developed to distinguish uveal melanoma from hemangioma and metastatic carcinoma, obtaining performance comparable to senior ophthalmologists in a large cohort of Asian patients with choroid neoplasms.

Abstract

Diagnosing rare diseases presents a common challenge in clinical practice, necessitating the expertise of specialists for accurate identification. The advent of machine learning offers a promising solution, while the development of such technologies is hindered by the scarcity of data on rare conditions and the demand for models that are both interpretable and trustworthy in a clinical context. Interpretable AI, with its capacity for human-readable outputs, can facilitate validation by clinicians and contribute to medical education. In the current work, we focus on choroid neoplasias, the most prevalent form of eye cancer in adults, albeit rare with 5.1 per million. We built the so-far largest dataset consisting of 750 patients, incorporating three distinct imaging modalities collected from 2004 to 2022. Our work introduces a concept-based interpretable model that distinguishes between three types of choroidal tumors, integrating insights from domain experts via radiological reports. Remarkably, this model not only achieves an F1 score of 0.91, rivaling that of black-box models, but also boosts the diagnostic accuracy of junior doctors by 42%. This study highlights the significant potential of interpretable machine learning in improving the diagnosis of rare diseases, laying a groundwork for future breakthroughs in medical AI that could tackle a wider array of complex health scenarios.
Paper Structure (5 sections, 1 equation, 15 figures)

This paper contains 5 sections, 1 equation, 15 figures.

Figures (15)

  • Figure 1: Overview of the MMCBM workflow. (a) Utilizing a large language model (LLM), concept banks are formulated by extracting image-concept pairs from comprehensive medical reports. Senior experts help examine the faithfulness of the image-concept pairs and make corresponding modifications. (b) Based on such pairs, we construct the concept bank by learning concept activation vectors. (c) The model's output stage takes a series of images spanning 1 to 3 modalities. A pre-trained image encoder is employed to convert these images into tokenized features. Subsequent calculations produce concept scores. The model then delivers an explainable prediction, spotlighting the diagnostic evidence. Moreover, it crafts an interpretative report, enhancing the transparency of the diagnostic process.
  • Figure 2: Statistics of the CTI Dataset. (a) The CTI dataset is composed of 750 patients: 542 with melanoma, 128 with hemangioma, and 80 with metastatic carcinoma, collected from 2004 to 2022. (b) Proportions of patients with hemangioma, metastatic carcinoma, and melanoma imaged by Fluorescein Angiography (FA), Indocyanine Green Angiography (ICGA), and Ultrasound (US). (c) Split of imaging studies in the training and test datasets across various imaging modalities: 20% of the Multi-Modal data (MM), representing patients imaged with all three modalities, is set aside for testing. The remaining 80% of MM and all non-MM data are allocated for training using 5-fold cross-validation.
  • Figure 3: Multimodal Medical Concept Bottleneck Model (MMCBM). Black-box models such as the pre-trained classifier learn directly from the encoded image features and output a single model prediction without any insight as to how the prediction was computed. In contrast, the MMCBM shown in (a) instead represents encoded features by their alignment with key medical concepts derived from domain experts. This allows MMCBM to return not only its prediction but also the top-$k$ activated concepts that best describe the input images, giving insight into how the model arrived at its prediction. Comparing both the classification (b) accuracy and (c) sensitivity of the models, there is no statistically significant difference between black-box models and MMCBM across all sets of imaging inputs. MMCBM concepts also outperform features derived from CLIP-based models, highlighting the importance of source prior knowledge from domain experts. (d) Performance Benchmark with Human Evaluators: A comparison of our model's performance against junior and senior doctors. After presenting them with the model's predicted concepts, they conducted a subsequent assessment, enabling us to document and compare performance metrics. (e) Confusion Matrix for Human Evaluators w/wo Concepts. The matrices correspond to Junior, Junior & Model, Senior, and Senior & Model groups and are with corresponding colors.
  • Figure 4: | Comparative Human Evaluation and Model Insights. (a) Embedding Visualizations via t-SNE: This offers a graphical representation of embeddings from the trio of pretrained encoders. Notably, the fused MM embeddings are processed through the attention-pooling mechanism. (b) Accuracy of SVMs in generating concept banks using Concept Activation Vectors (CAVs). (c) Metrics of predicted Top-k concepts on test dataset with k = 10. This evaluation includes precision@k, recall@k, and F1@k, as well as mean rank@k, median rank@k, and mean reciprocal rank@k.
  • Figure 5: Demo of Human Interactive Interface. We make a website to facilitate the user interactive study with ophthalmologists. (a) Image Display Panel: as FA and ICGA imaging span various time frames, ophthalmologists pinpoint images from early, medium, and late phases for accurate classification. (b) Interventions interface on concept bottleneck: a panel that allows adjustment of the concept scores to refine the final prediction. (c) Visual Emphasis on Bottlenecks: a curated selection of representative cases processed by the model, highlighting the top-k concepts prioritized by their attention scores in the weight matrix displayed across three distinct tumor classes. (d) Diagnostic Reporting in Action: an example of a diagnostic report formulated by ChatGPT during the testing phase. The input to ChatGPT includes the predicted top-k concepts combined with patient-specific details, highlighting the model's capability to produce interpretable diagnoses.
  • ...and 10 more figures