Table of Contents
Fetching ...

Efficient Universal Models for Medical Image Segmentation via Weakly Supervised In-Context Learning

Jiesi Hu, Yanwu Yang, Zhiyu Ye, Jinyan Zhou, Jianfeng Cao, Hanyang Peng, Ting Ma

TL;DR

This work tackles high annotation costs in universal medical image segmentation by introducing Weakly Supervised In-Context Learning (WS-ICL), which uses weak prompts in the context set instead of dense masks. It combines a dual-branch, memory-efficient Neuroverse3D backbone with prompt channels to produce segmentation for a target image conditioned on a context set, enabling both WS-ICL and interactive operation. Evaluations across 18 diverse datasets and three held-out distributions show WS-ICL can match fully supervised ICL performance at a fraction of the annotation effort, while remaining highly competitive in interactive settings. The approach promises a more efficient and unified framework for medical image segmentation, with publicly available code and models to support adoption and further development.

Abstract

Universal models for medical image segmentation, such as interactive and in-context learning (ICL) models, offer strong generalization but require extensive annotations. Interactive models need repeated user prompts for each image, while ICL relies on dense, pixel-level labels. To address this, we propose Weakly Supervised In-Context Learning (WS-ICL), a new ICL paradigm that leverages weak prompts (e.g., bounding boxes or points) instead of dense labels for context. This approach significantly reduces annotation effort by eliminating the need for fine-grained masks and repeated user prompting for all images. We evaluated the proposed WS-ICL model on three held-out benchmarks. Experimental results demonstrate that WS-ICL achieves performance comparable to regular ICL models at a significantly lower annotation cost. In addition, WS-ICL is highly competitive even under the interactive paradigm. These findings establish WS-ICL as a promising step toward more efficient and unified universal models for medical image segmentation. Our code and model are publicly available at https://github.com/jiesihu/Weak-ICL.

Efficient Universal Models for Medical Image Segmentation via Weakly Supervised In-Context Learning

TL;DR

This work tackles high annotation costs in universal medical image segmentation by introducing Weakly Supervised In-Context Learning (WS-ICL), which uses weak prompts in the context set instead of dense masks. It combines a dual-branch, memory-efficient Neuroverse3D backbone with prompt channels to produce segmentation for a target image conditioned on a context set, enabling both WS-ICL and interactive operation. Evaluations across 18 diverse datasets and three held-out distributions show WS-ICL can match fully supervised ICL performance at a fraction of the annotation effort, while remaining highly competitive in interactive settings. The approach promises a more efficient and unified framework for medical image segmentation, with publicly available code and models to support adoption and further development.

Abstract

Universal models for medical image segmentation, such as interactive and in-context learning (ICL) models, offer strong generalization but require extensive annotations. Interactive models need repeated user prompts for each image, while ICL relies on dense, pixel-level labels. To address this, we propose Weakly Supervised In-Context Learning (WS-ICL), a new ICL paradigm that leverages weak prompts (e.g., bounding boxes or points) instead of dense labels for context. This approach significantly reduces annotation effort by eliminating the need for fine-grained masks and repeated user prompting for all images. We evaluated the proposed WS-ICL model on three held-out benchmarks. Experimental results demonstrate that WS-ICL achieves performance comparable to regular ICL models at a significantly lower annotation cost. In addition, WS-ICL is highly competitive even under the interactive paradigm. These findings establish WS-ICL as a promising step toward more efficient and unified universal models for medical image segmentation. Our code and model are publicly available at https://github.com/jiesihu/Weak-ICL.

Paper Structure

This paper contains 8 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Comparison of segmentation paradigms. Regular in-context learning segmentation requires fine-grained masks for the context set, while interactive segmentation relies on per-image prompts. Our proposed weakly supervised ICL paradigm integrates the strengths of both approaches by using prompts in the context set, eliminating the need for fine-grained annotations or repeated prompting.
  • Figure 2: Illustration of the proposed WS-ICL task. Context images are concatenated with the prompt channels and jointly processed with the target image through the WS-ICL network to generate the segmentation prediction.
  • Figure 3: Qualitative results of WS-ICL (Box) with 8 context images and 5 prompts per image.
  • Figure 4: Model performance with different context set sizes. The legend indicates the number of prompts per image. Dice scores are averaged over all tasks.
  • Figure 5: Performance of different models and the corresponding annotation time for context construction. Annotation times are approximated as 5, 10, 80, and 1600 seconds for point, bounding box, 2D mask, and 3D mask, respectively. Dice scores are averaged over all tasks.