Table of Contents
Fetching ...

Disease-informed Adaptation of Vision-Language Models

Jiajin Zhang, Ge Wang, Mannudeep K. Kalra, Pingkun Yan

TL;DR

This work tackles the challenge of adapting pretrained vision-language models to underrepresented or novel diseases in medical imaging under severe data scarcity. It introduces a disease-informed adaptation framework comprising DiCoP and DPL: DiCoP builds disease-specific prompts grounded in clinical attributes (texture, location, shape) and enriches them with image context via an image feature projector, while DPL learns disease prototypes with geometric regularization to shape a well-structured latent space. The model is trained with three losses, $L_{ita}$, $L_{prot}$, and $L_{reg-ce}$, to align image and prompt representations and to enforce prototype intra-class cohesion and inter-class separation, with only the vision branch used for inference. Empirical results on PanNuke and COVID-x show substantial gains over adapter- and prompting-based baselines, especially with limited labeled data, and ablations validate the contribution of each component. Overall, the approach offers a practical, data-efficient pathway to deploy clinically informed VLMs across diverse medical imaging tasks and diseases.

Abstract

In medical image analysis, the expertise scarcity and the high cost of data annotation limits the development of large artificial intelligence models. This paper investigates the potential of transfer learning with pre-trained vision-language models (VLMs) in this domain. Currently, VLMs still struggle to transfer to the underrepresented diseases with minimal presence and new diseases entirely absent from the pretraining dataset. We argue that effective adaptation of VLMs hinges on the nuanced representation learning of disease concepts. By capitalizing on the joint visual-linguistic capabilities of VLMs, we introduce disease-informed contextual prompting in a novel disease prototype learning framework. This approach enables VLMs to grasp the concepts of new disease effectively and efficiently, even with limited data. Extensive experiments across multiple image modalities showcase notable enhancements in performance compared to existing techniques.

Disease-informed Adaptation of Vision-Language Models

TL;DR

This work tackles the challenge of adapting pretrained vision-language models to underrepresented or novel diseases in medical imaging under severe data scarcity. It introduces a disease-informed adaptation framework comprising DiCoP and DPL: DiCoP builds disease-specific prompts grounded in clinical attributes (texture, location, shape) and enriches them with image context via an image feature projector, while DPL learns disease prototypes with geometric regularization to shape a well-structured latent space. The model is trained with three losses, , , and , to align image and prompt representations and to enforce prototype intra-class cohesion and inter-class separation, with only the vision branch used for inference. Empirical results on PanNuke and COVID-x show substantial gains over adapter- and prompting-based baselines, especially with limited labeled data, and ablations validate the contribution of each component. Overall, the approach offers a practical, data-efficient pathway to deploy clinically informed VLMs across diverse medical imaging tasks and diseases.

Abstract

In medical image analysis, the expertise scarcity and the high cost of data annotation limits the development of large artificial intelligence models. This paper investigates the potential of transfer learning with pre-trained vision-language models (VLMs) in this domain. Currently, VLMs still struggle to transfer to the underrepresented diseases with minimal presence and new diseases entirely absent from the pretraining dataset. We argue that effective adaptation of VLMs hinges on the nuanced representation learning of disease concepts. By capitalizing on the joint visual-linguistic capabilities of VLMs, we introduce disease-informed contextual prompting in a novel disease prototype learning framework. This approach enables VLMs to grasp the concepts of new disease effectively and efficiently, even with limited data. Extensive experiments across multiple image modalities showcase notable enhancements in performance compared to existing techniques.
Paper Structure (10 sections, 4 equations, 4 figures, 2 tables)

This paper contains 10 sections, 4 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: The framework overview. (a) DiCoP produces prompts informed by specific diseases. (b) DPL enables learning disease representations with limited data.
  • Figure 2: t-SNE visualization on the latent space on PanNuke. (a) Linear probing, (b) CoCoOp, (c) our method, and (d) ours with negative samples of each organ dyed distinctly.
  • Figure 3: Data efficiency on (a) kidney cancer in PanNuke and (b) COVID in COVID-x.
  • Figure 4: Ablation studies on (a) kidney in PanNuke and (b) COVID in COVID-x.