Table of Contents
Fetching ...

Semi-Supervised Few-Shot Adaptation of Vision-Language Models

Julio Silva-Rodríguez, Ender Konukoglu

TL;DR

The proposed method enables lower-budget annotation pipelines for adapting VLMs, reducing labeling effort by>50% in low-shot regimes by leveraging unlabeled data by introducing an efficient semi-supervised solver that propagates text-informed pseudo-labels during few-shot adaptation.

Abstract

Vision-language models (VLMs) pre-trained on large, heterogeneous data sources are becoming increasingly popular, providing rich multi-modal embeddings that enable efficient transfer to new tasks. A particularly relevant application is few-shot adaptation, where only a handful of annotated examples are available to adapt the model through multi-modal linear probes. In medical imaging, specialized VLMs have shown promising performance in zero- and few-shot image classification, which is valuable for mitigating the high cost of expert annotations. However, challenges remain in extremely low-shot regimes: the inherent class imbalances in medical tasks often lead to underrepresented categories, penalizing overall model performance. To address this limitation, we propose leveraging unlabeled data by introducing an efficient semi-supervised solver that propagates text-informed pseudo-labels during few-shot adaptation. The proposed method enables lower-budget annotation pipelines for adapting VLMs, reducing labeling effort by >50% in low-shot regimes.

Semi-Supervised Few-Shot Adaptation of Vision-Language Models

TL;DR

The proposed method enables lower-budget annotation pipelines for adapting VLMs, reducing labeling effort by>50% in low-shot regimes by leveraging unlabeled data by introducing an efficient semi-supervised solver that propagates text-informed pseudo-labels during few-shot adaptation.

Abstract

Vision-language models (VLMs) pre-trained on large, heterogeneous data sources are becoming increasingly popular, providing rich multi-modal embeddings that enable efficient transfer to new tasks. A particularly relevant application is few-shot adaptation, where only a handful of annotated examples are available to adapt the model through multi-modal linear probes. In medical imaging, specialized VLMs have shown promising performance in zero- and few-shot image classification, which is valuable for mitigating the high cost of expert annotations. However, challenges remain in extremely low-shot regimes: the inherent class imbalances in medical tasks often lead to underrepresented categories, penalizing overall model performance. To address this limitation, we propose leveraging unlabeled data by introducing an efficient semi-supervised solver that propagates text-informed pseudo-labels during few-shot adaptation. The proposed method enables lower-budget annotation pipelines for adapting VLMs, reducing labeling effort by >50% in low-shot regimes.
Paper Structure (11 sections, 10 equations, 3 figures, 1 table)

This paper contains 11 sections, 10 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Semi-supervised few-shot VLMs adaptation.
  • Figure 2: Few-shot adaptation performance per dataset.
  • Figure 3: Studies on efficiency, convergence, and exploratory analysis.