Weakly Supervised Concept Learning with Class-Level Priors for Interpretable Medical Diagnosis
Md Nahiduzzaman, Steven Korevaar, Alireza Bab-Hadiashar, Ruwan Tennakoon
TL;DR
This work tackles the challenge of interpretable medical predictions without costly concept-level annotations or reliance on vision–language models. It proposes the Prior-guided Concept Predictor (PCP), a weakly supervised framework that predicts a vector of concepts from medical images by leveraging class-level priors $P(c_m|y)$ and a residual refinement mechanism that preserves image-specific information. A composite loss combining triplet, class-matching, KL, and entropy terms aligns the predicted concept distributions with priors and sharpens attention to clinically relevant concepts. Across multiple datasets spanning dermoscopy, hematology, and chest imaging, PCP achieves superior concept prediction compared with zero-shot baselines and competitive classification performance against fully supervised models, demonstrating reliable, interpretable concept reasoning with minimal supervision. The results indicate strong potential for scalable, interpretable diagnostic AI in clinical settings, with future directions focusing on adaptive prior updates and self-distilled reasoning to further enhance generalization.
Abstract
Human-interpretable predictions are essential for deploying AI in medical imaging, yet most interpretable-by-design (IBD) frameworks require concept annotations for training data, which are costly and impractical to obtain in clinical contexts. Recent attempts to bypass annotation, such as zero-shot vision-language models or concept-generation frameworks, struggle to capture domain-specific medical features, leading to poor reliability. In this paper, we propose a novel Prior-guided Concept Predictor (PCP), a weakly supervised framework that enables concept answer prediction without explicit supervision or reliance on language models. PCP leverages class-level concept priors as weak supervision and incorporates a refinement mechanism with KL divergence and entropy regularization to align predictions with clinical reasoning. Experiments on PH2 (dermoscopy) and WBCatt (hematology) show that PCP improves concept-level F1-score by over 33% compared to zero-shot baselines, while delivering competitive classification performance on four medical datasets (PH2, WBCatt, HAM10000, and CXR4) relative to fully supervised concept bottleneck models (CBMs) and V-IP.
