Table of Contents
Fetching ...

Cross- and Intra-image Prototypical Learning for Multi-label Disease Diagnosis and Interpretation

Chong Wang, Fengbei Liu, Yuanhong Chen, Helen Frazer, Gustavo Carneiro

TL;DR

The paper tackles multi-label disease diagnosis and interpretation in medical imaging by introducing Cross- and Intra-image Prototypical Learning (CIPL). CIPL uses cross-image co-attention to disentangle entangled disease features and intra-image alignment to regularize both interpretations and predictions, producing region-level class prototypes grounded to training patches. With a two-level regularization and a grounding strategy, CIPL achieves state-of-the-art classification and weakly supervised localisation on NIH ChestX-ray14 and ODIF (ODIR) datasets, while offering interpretable similarity maps that highlight disease-relevant regions. The approach generalizes to single-label tasks and underscores the practical impact of combining cross- and intra-image signals for robust, interpretable medical image analysis.

Abstract

Recent advances in prototypical learning have shown remarkable potential to provide useful decision interpretations associating activation maps and predictions with class-specific training prototypes. Such prototypical learning has been well-studied for various single-label diseases, but for quite relevant and more challenging multi-label diagnosis, where multiple diseases are often concurrent within an image, existing prototypical learning models struggle to obtain meaningful activation maps and effective class prototypes due to the entanglement of the multiple diseases. In this paper, we present a novel Cross- and Intra-image Prototypical Learning (CIPL) framework, for accurate multi-label disease diagnosis and interpretation from medical images. CIPL takes advantage of common cross-image semantics to disentangle the multiple diseases when learning the prototypes, allowing a comprehensive understanding of complicated pathological lesions. Furthermore, we propose a new two-level alignment-based regularisation strategy that effectively leverages consistent intra-image information to enhance interpretation robustness and predictive performance. Extensive experiments show that our CIPL attains the state-of-the-art (SOTA) classification accuracy in two public multi-label benchmarks of disease diagnosis: thoracic radiography and fundus images. Quantitative interpretability results show that CIPL also has superiority in weakly-supervised thoracic disease localisation over other leading saliency- and prototype-based explanation methods.

Cross- and Intra-image Prototypical Learning for Multi-label Disease Diagnosis and Interpretation

TL;DR

The paper tackles multi-label disease diagnosis and interpretation in medical imaging by introducing Cross- and Intra-image Prototypical Learning (CIPL). CIPL uses cross-image co-attention to disentangle entangled disease features and intra-image alignment to regularize both interpretations and predictions, producing region-level class prototypes grounded to training patches. With a two-level regularization and a grounding strategy, CIPL achieves state-of-the-art classification and weakly supervised localisation on NIH ChestX-ray14 and ODIF (ODIR) datasets, while offering interpretable similarity maps that highlight disease-relevant regions. The approach generalizes to single-label tasks and underscores the practical impact of combining cross- and intra-image signals for robust, interpretable medical image analysis.

Abstract

Recent advances in prototypical learning have shown remarkable potential to provide useful decision interpretations associating activation maps and predictions with class-specific training prototypes. Such prototypical learning has been well-studied for various single-label diseases, but for quite relevant and more challenging multi-label diagnosis, where multiple diseases are often concurrent within an image, existing prototypical learning models struggle to obtain meaningful activation maps and effective class prototypes due to the entanglement of the multiple diseases. In this paper, we present a novel Cross- and Intra-image Prototypical Learning (CIPL) framework, for accurate multi-label disease diagnosis and interpretation from medical images. CIPL takes advantage of common cross-image semantics to disentangle the multiple diseases when learning the prototypes, allowing a comprehensive understanding of complicated pathological lesions. Furthermore, we propose a new two-level alignment-based regularisation strategy that effectively leverages consistent intra-image information to enhance interpretation robustness and predictive performance. Extensive experiments show that our CIPL attains the state-of-the-art (SOTA) classification accuracy in two public multi-label benchmarks of disease diagnosis: thoracic radiography and fundus images. Quantitative interpretability results show that CIPL also has superiority in weakly-supervised thoracic disease localisation over other leading saliency- and prototype-based explanation methods.

Paper Structure

This paper contains 22 sections, 14 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: In multi-label learning, (a) conventional prototypical learning strategy directly learns class prototypes from entangled multiple diseases present in training samples, by considering only individual-image information; (b) our cross-image prototypical learning strategy leverages common semantics of paired images to learn class prototypes from disentangled multiple diseases; (c) our intra-image prototypical learning strategy exploits consistent cues between paired augmented views of an image for regularising both interpretations and predictions.
  • Figure 2: (a) Architecture of our proposed CIPL method for the multi-label disease diagnosis and interpretation. CIPL leverages a co-attention mechanism to mine cross-image semantics from paired images $\mathbf{x}_a$ and $\mathbf{x}_b$, with the goal of learning disentangled class prototypes from multi-label training samples. CIPL also regularises the prototype's learning with interpretation and prediction consistency between augmented image views, e.g., $\mathbf{x}_b$ and $\mathbf{x}_{b^{\prime}}$; (b) examples of prototypes that encode representative pathological lesions for each disease class (here we illustrate two prototypes per class); (c) co-attention mechanism extracts common semantics between paired images for multi-class disease disentangling.
  • Figure 3: Visual comparison of prototypes learned from ProtoPNet chen2019looks, ProtoPNet++ wang2022Knowledge, Deformable ProtoPNet donnelly2022deformable, PIP-Net nauta2023pip, and our CIPL. In each pair, the left-sided image displays the prototype (denoted by yellow counters) in the source training image and the corresponding right-sided image highlights prototypes in the self-activated similarity map. The ground-truth image labels are written in red.
  • Figure 4: Visual prototypes of atelectasis (a), nodule (b), and infiltration (c), learned by our CIPL method from NIH ChestX-ray14. In each pair, the left-sided image displays the prototype (denoted by yellow counters) in the source training image and the corresponding right-sided image highlights prototypes in the self-activated similarity map. The ground-truth image labels are written in red.
  • Figure 5: A typical example of our CIPL method for multiple disease diagnosis interpretation from a testing chest X-ray image, where the green boxes indicate ground-truth disease annotations. For simplicity, the figure only illustrates one prototype and the corresponding similarity map from the class infiltration, mass, and no-findings.
  • ...and 2 more figures