Table of Contents
Fetching ...

DWARF: Disease-weighted network for attention map refinement

Haozhe Luo, Aurélie Pahud de Mortanges, Oana Inel, Abraham Bernstein, Mauricio Reyes

TL;DR

DWARF tackles interpretability in medical imaging by integrating clinicians into the training loop to refine attention maps via disease-specific guidance. It combines a pretrained Vision-Language Model with disease-specific segmentation heads and cyclic training to align explanations with findings. Across ChestX-Det, CheXlocalize, and Vindr-CXR, DWARF achieves state-of-the-art performance and more trustworthy attention maps, while clinician evaluations indicate higher confidence in AI-assisted classifications. The work also introduces Identity Enhanced Initialization to mitigate shortcut learning and discusses future directions for transferability and few-shot adaptation.

Abstract

The interpretability of deep learning is crucial for evaluating the reliability of medical imaging models and reducing the risks of inaccurate patient recommendations. This study addresses the "human out of the loop" and "trustworthiness" issues in medical image analysis by integrating medical professionals into the interpretability process. We propose a disease-weighted attention map refinement network (DWARF) that leverages expert feedback to enhance model relevance and accuracy. Our method employs cyclic training to iteratively improve diagnostic performance, generating precise and interpretable feature maps. Experimental results demonstrate significant improvements in interpretability and diagnostic accuracy across multiple medical imaging datasets. This approach fosters effective collaboration between AI systems and healthcare professionals, ultimately aiming to improve patient outcomes

DWARF: Disease-weighted network for attention map refinement

TL;DR

DWARF tackles interpretability in medical imaging by integrating clinicians into the training loop to refine attention maps via disease-specific guidance. It combines a pretrained Vision-Language Model with disease-specific segmentation heads and cyclic training to align explanations with findings. Across ChestX-Det, CheXlocalize, and Vindr-CXR, DWARF achieves state-of-the-art performance and more trustworthy attention maps, while clinician evaluations indicate higher confidence in AI-assisted classifications. The work also introduces Identity Enhanced Initialization to mitigate shortcut learning and discusses future directions for transferability and few-shot adaptation.

Abstract

The interpretability of deep learning is crucial for evaluating the reliability of medical imaging models and reducing the risks of inaccurate patient recommendations. This study addresses the "human out of the loop" and "trustworthiness" issues in medical image analysis by integrating medical professionals into the interpretability process. We propose a disease-weighted attention map refinement network (DWARF) that leverages expert feedback to enhance model relevance and accuracy. Our method employs cyclic training to iteratively improve diagnostic performance, generating precise and interpretable feature maps. Experimental results demonstrate significant improvements in interpretability and diagnostic accuracy across multiple medical imaging datasets. This approach fosters effective collaboration between AI systems and healthcare professionals, ultimately aiming to improve patient outcomes
Paper Structure (19 sections, 4 equations, 5 figures, 4 tables, 1 algorithm)

This paper contains 19 sections, 4 equations, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: Flow chart of finetuning the classification model. Our method only trains single disease each epoch with disease name as prompt. For each disease, we add an additional head to mapping origin attention to refined segmentation map.
  • Figure 2: With random initialization, the model tends to directly learn shortcut results which always highlight the same area. While using IEI initialization, the model can start from pretrained VLM's attention to refine its focus.
  • Figure 3: DWARF demonstrates sustained learning capacity, benefiting from extended training epochs, whereas the baseline model suffers from overfitting with additional training.
  • Figure 4: Qualitative results of training with and without the DWARF architecture demonstrate that utilizing our DWARF framework consistently enhances the aggregation of feature maps and provides prior region information.
  • Figure : Training Process for DWARF