Table of Contents
Fetching ...

Concept Complement Bottleneck Model for Interpretable Medical Image Diagnosis

Hongmei Wang, Junlin Hou, Hao Chen

TL;DR

The paper tackles interpretability in medical image diagnosis by introducing the Concept Complement Bottleneck Model (CCBM), which augments a predefined concept set with learnable unknown concepts learned via concept adapters and cross-attention to narrow the gap to black-box models. It combines textual known concepts encoded by a frozen text encoder with visual features through per-concept adapters and a multi-head cross-attention mechanism, jointly optimizing disease prediction and concept detection while enabling unknown concepts to complement the predefined set. The method uses a two-part loss: a classification loss \(\mathcal{L}_{ce}\) and a concept-detection loss \(\mathcal{L}_{cep}\), along with a similarity loss \(\mathcal{L}_{sim}\) to diversify unknown concepts, and a final prediction layer that fuses known and unknown concept scores. Experiments on Derm7pt, Skincon, BrEaST, and LIDC-IDRI demonstrate that CCBM achieves state-of-the-art concept detection and competitive disease diagnosis across modalities, with rich visual and textual explanations and faithful interpretability analyses. The work advances clinically relevant interpretability by enabling automatic discovery of supplementary concepts and providing robust explanations, potentially reducing reliance on exhaustive concept annotations in medical imaging.

Abstract

Models based on human-understandable concepts have received extensive attention to improve model interpretability for trustworthy artificial intelligence in the field of medical image analysis. These methods can provide convincing explanations for model decisions but heavily rely on the detailed annotation of pre-defined concepts. Consequently, they may not be effective in cases where concepts or annotations are incomplete or low-quality. Although some methods automatically discover effective and new visual concepts rather than using pre-defined concepts or could find some human-understandable concepts via large Language models, they are prone to veering away from medical diagnostic evidence and are challenging to understand. In this paper, we propose a concept complement bottleneck model for interpretable medical image diagnosis with the aim of complementing the existing concept set and finding new concepts bridging the gap between explainable models. Specifically, we propose to use concept adapters for specific concepts to mine the concept differences and score concepts in their own attention channels to support almost fairly concept learning. Then, we devise a concept complement strategy to learn new concepts while jointly using known concepts to improve model performance. Comprehensive experiments on medical datasets demonstrate that our model outperforms the state-of-the-art competitors in concept detection and disease diagnosis tasks while providing diverse explanations to ensure model interpretability effectively.

Concept Complement Bottleneck Model for Interpretable Medical Image Diagnosis

TL;DR

The paper tackles interpretability in medical image diagnosis by introducing the Concept Complement Bottleneck Model (CCBM), which augments a predefined concept set with learnable unknown concepts learned via concept adapters and cross-attention to narrow the gap to black-box models. It combines textual known concepts encoded by a frozen text encoder with visual features through per-concept adapters and a multi-head cross-attention mechanism, jointly optimizing disease prediction and concept detection while enabling unknown concepts to complement the predefined set. The method uses a two-part loss: a classification loss and a concept-detection loss , along with a similarity loss to diversify unknown concepts, and a final prediction layer that fuses known and unknown concept scores. Experiments on Derm7pt, Skincon, BrEaST, and LIDC-IDRI demonstrate that CCBM achieves state-of-the-art concept detection and competitive disease diagnosis across modalities, with rich visual and textual explanations and faithful interpretability analyses. The work advances clinically relevant interpretability by enabling automatic discovery of supplementary concepts and providing robust explanations, potentially reducing reliance on exhaustive concept annotations in medical imaging.

Abstract

Models based on human-understandable concepts have received extensive attention to improve model interpretability for trustworthy artificial intelligence in the field of medical image analysis. These methods can provide convincing explanations for model decisions but heavily rely on the detailed annotation of pre-defined concepts. Consequently, they may not be effective in cases where concepts or annotations are incomplete or low-quality. Although some methods automatically discover effective and new visual concepts rather than using pre-defined concepts or could find some human-understandable concepts via large Language models, they are prone to veering away from medical diagnostic evidence and are challenging to understand. In this paper, we propose a concept complement bottleneck model for interpretable medical image diagnosis with the aim of complementing the existing concept set and finding new concepts bridging the gap between explainable models. Specifically, we propose to use concept adapters for specific concepts to mine the concept differences and score concepts in their own attention channels to support almost fairly concept learning. Then, we devise a concept complement strategy to learn new concepts while jointly using known concepts to improve model performance. Comprehensive experiments on medical datasets demonstrate that our model outperforms the state-of-the-art competitors in concept detection and disease diagnosis tasks while providing diverse explanations to ensure model interpretability effectively.

Paper Structure

This paper contains 22 sections, 11 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: The Framework of Concept Complement Bottleneck Model. The input images are delivered to the image encoder to obtain the fundamental features, then different concept adapters extract specific concept features. Next, CCBM calculates the visual-text cross-attention score between textual known concepts/unknown concept embeddings and concept visual features. Finally, these concepts attention scores are aggregated to be passed through the decision layer for final disease diagnosis.
  • Figure 2: The fine-grained results of the concept detection task on the Derm7pt, BrEaST and Skincon datasets.The results are the means and stds of five-fold cross-validation experiments. The "Avg" is the mean and variance of AUC over all concepts.
  • Figure 3: Inference-time intervention results. The $x$-axis represents the thresholds ($t_1 \leq t_2 \leq ... \leq t_8$), and the $y$-axis represents the diagnosis performance after intervention.
  • Figure 4: Label efficiency experiment results. The $x$-axis and $y$-axis represent the training proportion and diagnosis performance, respectively.
  • Figure 5: Visual and textual explanations of two images from Derm7pt and LIDC-IDRI, respectively. We visualize the known concepts and unknown concepts (named C1 and C2) of these two examples. The value in brackets represents the truth label of the concept. (For Derm7pt, only the existing concepts are shown. ) Concept scores in red box indicate incorrectly predicted concepts. In the textual explanations, green text denotes correct predictions, while red text highlights incorrect predictions. Blue text presents the insights from the learned unknown concepts.