Table of Contents
Fetching ...

CoPA: Hierarchical Concept Prompting and Aggregating Network for Explainable Diagnosis

Yiheng Dong, Yi Lin, Xin Yang

TL;DR

CoPA addresses the interpretability gap in medical image diagnosis by capturing fine-grained, multiscale concepts across multiple encoder layers. It introduces the Concept-aware Embedding Generator (CEG) to distill layer-wise concept representations and Concept Prompt Tuning (CPT) to guide feature extraction without degrading the pretrained backbone, followed by a gated aggregation and contrastive cross-modal alignment with textual concepts. The approach yields state-of-the-art results on three dermoscopy/clinical datasets and provides faithful, understandable, and plausible explanations through concept heatmaps and a transparent prediction workflow. The framework demonstrates that hierarchical concept prompting and aggregation can enhance both diagnostic accuracy and interpretability in medical imaging, with practical implications for clinical deployment.

Abstract

The transparency of deep learning models is essential for clinical diagnostics. Concept Bottleneck Model provides clear decision-making processes for diagnosis by transforming the latent space of black-box models into human-understandable concepts. However, concept-based methods still face challenges in concept capture capabilities. These methods often rely on encode features solely from the final layer, neglecting shallow and multiscale features, and lack effective guidance in concept encoding, hindering fine-grained concept extraction. To address these issues, we introduce Concept Prompting and Aggregating (CoPA), a novel framework designed to capture multilayer concepts under prompt guidance. This framework utilizes the Concept-aware Embedding Generator (CEG) to extract concept representations from each layer of the visual encoder. Simultaneously, these representations serve as prompts for Concept Prompt Tuning (CPT), steering the model towards amplifying critical concept-related visual cues. Visual representations from each layer are aggregated to align with textual concept representations. With the proposed method, valuable concept-wise information in the images is captured and utilized effectively, thus improving the performance of concept and disease prediction. Extensive experimental results demonstrate that CoPA outperforms state-of-the-art methods on three public datasets. Code is available at https://github.com/yihengd/CoPA.

CoPA: Hierarchical Concept Prompting and Aggregating Network for Explainable Diagnosis

TL;DR

CoPA addresses the interpretability gap in medical image diagnosis by capturing fine-grained, multiscale concepts across multiple encoder layers. It introduces the Concept-aware Embedding Generator (CEG) to distill layer-wise concept representations and Concept Prompt Tuning (CPT) to guide feature extraction without degrading the pretrained backbone, followed by a gated aggregation and contrastive cross-modal alignment with textual concepts. The approach yields state-of-the-art results on three dermoscopy/clinical datasets and provides faithful, understandable, and plausible explanations through concept heatmaps and a transparent prediction workflow. The framework demonstrates that hierarchical concept prompting and aggregation can enhance both diagnostic accuracy and interpretability in medical imaging, with practical implications for clinical deployment.

Abstract

The transparency of deep learning models is essential for clinical diagnostics. Concept Bottleneck Model provides clear decision-making processes for diagnosis by transforming the latent space of black-box models into human-understandable concepts. However, concept-based methods still face challenges in concept capture capabilities. These methods often rely on encode features solely from the final layer, neglecting shallow and multiscale features, and lack effective guidance in concept encoding, hindering fine-grained concept extraction. To address these issues, we introduce Concept Prompting and Aggregating (CoPA), a novel framework designed to capture multilayer concepts under prompt guidance. This framework utilizes the Concept-aware Embedding Generator (CEG) to extract concept representations from each layer of the visual encoder. Simultaneously, these representations serve as prompts for Concept Prompt Tuning (CPT), steering the model towards amplifying critical concept-related visual cues. Visual representations from each layer are aggregated to align with textual concept representations. With the proposed method, valuable concept-wise information in the images is captured and utilized effectively, thus improving the performance of concept and disease prediction. Extensive experimental results demonstrate that CoPA outperforms state-of-the-art methods on three public datasets. Code is available at https://github.com/yihengd/CoPA.

Paper Structure

This paper contains 18 sections, 6 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: The overall pipeline of CoPA, which consists of a multi-layer visual concept encoder, a concept alignment bottleneck layer, and a gated aggregation module.
  • Figure 2: Intervention Examples. ITV: Intervention
  • Figure 3: Illustration of understandability and plausibility, where PN, DaG, STR, RA, BWV stand for "Pigment Network", "Dots and Globules", "Streaks", "Regression Area", "Blue-Whitish Veil", respectively. (a) Heatmap visualization of concern areas for each concept. (b) The entire process of the prediction, including concept visualization, concept alignment scores, gated fusion mechanism weights, and diagnose confidence.