Cross- and Intra-image Prototypical Learning for Multi-label Disease Diagnosis and Interpretation
Chong Wang, Fengbei Liu, Yuanhong Chen, Helen Frazer, Gustavo Carneiro
TL;DR
The paper tackles multi-label disease diagnosis and interpretation in medical imaging by introducing Cross- and Intra-image Prototypical Learning (CIPL). CIPL uses cross-image co-attention to disentangle entangled disease features and intra-image alignment to regularize both interpretations and predictions, producing region-level class prototypes grounded to training patches. With a two-level regularization and a grounding strategy, CIPL achieves state-of-the-art classification and weakly supervised localisation on NIH ChestX-ray14 and ODIF (ODIR) datasets, while offering interpretable similarity maps that highlight disease-relevant regions. The approach generalizes to single-label tasks and underscores the practical impact of combining cross- and intra-image signals for robust, interpretable medical image analysis.
Abstract
Recent advances in prototypical learning have shown remarkable potential to provide useful decision interpretations associating activation maps and predictions with class-specific training prototypes. Such prototypical learning has been well-studied for various single-label diseases, but for quite relevant and more challenging multi-label diagnosis, where multiple diseases are often concurrent within an image, existing prototypical learning models struggle to obtain meaningful activation maps and effective class prototypes due to the entanglement of the multiple diseases. In this paper, we present a novel Cross- and Intra-image Prototypical Learning (CIPL) framework, for accurate multi-label disease diagnosis and interpretation from medical images. CIPL takes advantage of common cross-image semantics to disentangle the multiple diseases when learning the prototypes, allowing a comprehensive understanding of complicated pathological lesions. Furthermore, we propose a new two-level alignment-based regularisation strategy that effectively leverages consistent intra-image information to enhance interpretation robustness and predictive performance. Extensive experiments show that our CIPL attains the state-of-the-art (SOTA) classification accuracy in two public multi-label benchmarks of disease diagnosis: thoracic radiography and fundus images. Quantitative interpretability results show that CIPL also has superiority in weakly-supervised thoracic disease localisation over other leading saliency- and prototype-based explanation methods.
