Exploring Low-Resource Medical Image Classification with Weakly Supervised Prompt Learning
Fudan Zheng, Jindong Cao, Weijiang Yu, Zhiguang Chen, Nong Xiao, Yutong Lu
TL;DR
This work tackles the high annotation burden in medical image classification by introducing MedPrompt, a two-stage framework that first performs unsupervised vision-language pre-training on large-scale medical image-text data and then learns an instance-adaptive, weakly supervised prompt generator. The prompt generator, powered by a Meta-Net and context/class embeddings, automatically produces prompts that align with image embeddings to enable zero-shot and few-shot classification without heavy expert-designed prompts. Across four chest X-ray datasets, MedPrompt outperforms hand-crafted prompts in zero-shot and demonstrates strong few-shot performance, with a lightweight prompt module that can be embedded in various architectures. The approach reduces reliance on domain experts and offers scalable, adaptable medical image recognition in low-resource settings.
Abstract
Most advances in medical image recognition supporting clinical auxiliary diagnosis meet challenges due to the low-resource situation in the medical field, where annotations are highly expensive and professional. This low-resource problem can be alleviated by leveraging the transferable representations of large-scale pre-trained vision-language models via relevant medical text prompts. However, existing pre-trained vision-language models require domain experts to carefully design the medical prompts, which greatly increases the burden on clinicians. To address this problem, we propose a weakly supervised prompt learning method MedPrompt to automatically generate medical prompts, which includes an unsupervised pre-trained vision-language model and a weakly supervised prompt learning model. The unsupervised pre-trained vision-language model utilizes the natural correlation between medical images and corresponding medical texts for pre-training, without any manual annotations. The weakly supervised prompt learning model only utilizes the classes of images in the dataset to guide the learning of the specific class vector in the prompt, while the learning of other context vectors in the prompt requires no manual annotations for guidance. To the best of our knowledge, this is the first model to automatically generate medical prompts. With these prompts, the pre-trained vision-language model can be freed from the strong expert dependency of manual annotation and manual prompt design. Experimental results show that the model using our automatically generated prompts outperforms its full-shot learning hand-crafted prompts counterparts with only a minimal number of labeled samples for few-shot learning, and reaches superior or comparable accuracy on zero-shot image classification. The proposed prompt generator is lightweight and therefore can be embedded into any network architecture.
