Table of Contents
Fetching ...

Efficient In-Context Medical Segmentation with Meta-driven Visual Prompt Selection

Chenwei Wu, David Restrepo, Zitao Shuai, Zhongming Liu, Liyue Shen

TL;DR

MVPS addresses the sensitivity of in-context medical segmentation to prompt choice and domain shift by learning a meta-driven visual prompt retriever. It meta-trains a transformer-based retriever to select informative image-mask prompts from a support pool while keeping the large vision model frozen, using a Dice-based reward and policy-gradient optimization, with optional task augmentation and test-time adaptation. The approach yields consistent gains across 8 datasets, 4 tasks, and 3 modalities, demonstrating a data-centric, tuning-free enhancement that is compatible with multiple backbones and can complement model-centric methods like LoRA. This work enables label-efficient, cross-domain medical segmentation with practical potential for scalable deployment in diverse clinical settings.

Abstract

In-context learning (ICL) with Large Vision Models (LVMs) presents a promising avenue in medical image segmentation by reducing the reliance on extensive labeling. However, the ICL performance of LVMs highly depends on the choices of visual prompts and suffers from domain shifts. While existing works leveraging LVMs for medical tasks have focused mainly on model-centric approaches like fine-tuning, we study an orthogonal data-centric perspective on how to select good visual prompts to facilitate generalization to medical domain. In this work, we propose a label-efficient in-context medical segmentation method by introducing a novel Meta-driven Visual Prompt Selection mechanism (MVPS), where a prompt retriever obtained from a meta-learning framework actively selects the optimal images as prompts to promote model performance and generalizability. Evaluated on 8 datasets and 4 tasks across 3 medical imaging modalities, our proposed approach demonstrates consistent gains over existing methods under different scenarios, improving both computational and label efficiency. Finally, we show that MVPS is a flexible, finetuning-free module that could be easily plugged into different backbones and combined with other model-centric approaches.

Efficient In-Context Medical Segmentation with Meta-driven Visual Prompt Selection

TL;DR

MVPS addresses the sensitivity of in-context medical segmentation to prompt choice and domain shift by learning a meta-driven visual prompt retriever. It meta-trains a transformer-based retriever to select informative image-mask prompts from a support pool while keeping the large vision model frozen, using a Dice-based reward and policy-gradient optimization, with optional task augmentation and test-time adaptation. The approach yields consistent gains across 8 datasets, 4 tasks, and 3 modalities, demonstrating a data-centric, tuning-free enhancement that is compatible with multiple backbones and can complement model-centric methods like LoRA. This work enables label-efficient, cross-domain medical segmentation with practical potential for scalable deployment in diverse clinical settings.

Abstract

In-context learning (ICL) with Large Vision Models (LVMs) presents a promising avenue in medical image segmentation by reducing the reliance on extensive labeling. However, the ICL performance of LVMs highly depends on the choices of visual prompts and suffers from domain shifts. While existing works leveraging LVMs for medical tasks have focused mainly on model-centric approaches like fine-tuning, we study an orthogonal data-centric perspective on how to select good visual prompts to facilitate generalization to medical domain. In this work, we propose a label-efficient in-context medical segmentation method by introducing a novel Meta-driven Visual Prompt Selection mechanism (MVPS), where a prompt retriever obtained from a meta-learning framework actively selects the optimal images as prompts to promote model performance and generalizability. Evaluated on 8 datasets and 4 tasks across 3 medical imaging modalities, our proposed approach demonstrates consistent gains over existing methods under different scenarios, improving both computational and label efficiency. Finally, we show that MVPS is a flexible, finetuning-free module that could be easily plugged into different backbones and combined with other model-centric approaches.
Paper Structure (10 sections, 2 equations, 3 figures, 2 tables)

This paper contains 10 sections, 2 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: a) In-context segmentation. Large Vision Models are capable of taking in visual prompts of image-mask pairs and outputting the segmentation mask prediction for the query image. b) Instability of ICL with random prompting. ICL has shown unstable performance using random prompts. By conducting experiments of in-context skin lesion segmentation using SegGPT wang2023seggpt on 4 different dermatology datasets tschandl2018ham10000isicdatasetcodella2018skinmendoncca2015ph2, with prompt size of 2, the results show a large variance. (Mean DICE scores: $37.36\%$, $14.84\%$,$11.46\%$, and $31.80\%$). c) Better prompts lead to a significant improvement in ICL. In this simulation study, we iterate through all prompt selection options (with prompt size of 2 in this example) given a prompt pool of 100 images and test ICL performance. There is plenty of room for improvement over the current prompt selection methods like TopK zhang2023whatmakesgoodexamples approach.
  • Figure 2: Meta-training and meta-testing stages of the proposed MVPS framework (use dermatology dataset as an example). Note that prompt retriever is trainable while large vision model is kept frozen.
  • Figure 3: Segmentation Results from MVPS vs TopK Prompting.