Table of Contents
Fetching ...

Medical Visual Prompting (MVP): A Unified Framework for Versatile and High-Quality Medical Image Segmentation

Yulin Chen, Guoheng Huang, Kai Huang, Zijin Lin, Guo Zhong, Shenghong Luo, Jie Deng, Jian Zhou

TL;DR

This work tackles the challenge of accurate and shape-preserving medical image segmentation across diverse modalities with limited labeled data. It introduces Medical Visual Prompting (MVP), a framework that keeps the backbone frozen while injecting shape priors through three prompting modules: SPGP (superpixel-guided), IEGP (image embedding guided), and AAGP (adaptive attention-guided prompts). By fusing SPGP and IEGP and employing AAGP to adapt prompts across layers, MVP achieves strong, task-general performance with relatively few trainable parameters, demonstrated on five datasets spanning endoscopy, CT, and MRI. The results indicate MVP's potential to streamline clinical segmentation workflows and support multitask learning without extensive task-specific fine-tuning.

Abstract

Accurate segmentation of lesion regions is crucial for clinical diagnosis and treatment across various diseases. While deep convolutional networks have achieved satisfactory results in medical image segmentation, they face challenges such as loss of lesion shape information due to continuous convolution and downsampling, as well as the high cost of manually labeling lesions with varying shapes and sizes. To address these issues, we propose a novel medical visual prompting (MVP) framework that leverages pre-training and prompting concepts from natural language processing (NLP). The framework utilizes three key components: Super-Pixel Guided Prompting (SPGP) for superpixelating the input image, Image Embedding Guided Prompting (IEGP) for freezing patch embedding and merging with superpixels to provide visual prompts, and Adaptive Attention Mechanism Guided Prompting (AAGP) for pinpointing prompt content and efficiently adapting all layers. By integrating SPGP, IEGP, and AAGP, the MVP enables the segmentation network to better learn shape prompting information and facilitates mutual learning across different tasks. Extensive experiments conducted on five datasets demonstrate superior performance of this method in various challenging medical image tasks, while simplifying single-task medical segmentation models. This novel framework offers improved performance with fewer parameters and holds significant potential for accurate segmentation of lesion regions in various medical tasks, making it clinically valuable.

Medical Visual Prompting (MVP): A Unified Framework for Versatile and High-Quality Medical Image Segmentation

TL;DR

This work tackles the challenge of accurate and shape-preserving medical image segmentation across diverse modalities with limited labeled data. It introduces Medical Visual Prompting (MVP), a framework that keeps the backbone frozen while injecting shape priors through three prompting modules: SPGP (superpixel-guided), IEGP (image embedding guided), and AAGP (adaptive attention-guided prompts). By fusing SPGP and IEGP and employing AAGP to adapt prompts across layers, MVP achieves strong, task-general performance with relatively few trainable parameters, demonstrated on five datasets spanning endoscopy, CT, and MRI. The results indicate MVP's potential to streamline clinical segmentation workflows and support multitask learning without extensive task-specific fine-tuning.

Abstract

Accurate segmentation of lesion regions is crucial for clinical diagnosis and treatment across various diseases. While deep convolutional networks have achieved satisfactory results in medical image segmentation, they face challenges such as loss of lesion shape information due to continuous convolution and downsampling, as well as the high cost of manually labeling lesions with varying shapes and sizes. To address these issues, we propose a novel medical visual prompting (MVP) framework that leverages pre-training and prompting concepts from natural language processing (NLP). The framework utilizes three key components: Super-Pixel Guided Prompting (SPGP) for superpixelating the input image, Image Embedding Guided Prompting (IEGP) for freezing patch embedding and merging with superpixels to provide visual prompts, and Adaptive Attention Mechanism Guided Prompting (AAGP) for pinpointing prompt content and efficiently adapting all layers. By integrating SPGP, IEGP, and AAGP, the MVP enables the segmentation network to better learn shape prompting information and facilitates mutual learning across different tasks. Extensive experiments conducted on five datasets demonstrate superior performance of this method in various challenging medical image tasks, while simplifying single-task medical segmentation models. This novel framework offers improved performance with fewer parameters and holds significant potential for accurate segmentation of lesion regions in various medical tasks, making it clinically valuable.
Paper Structure (16 sections, 11 equations, 4 figures, 6 tables)

This paper contains 16 sections, 11 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: A novel framework for medical visual prompting, freezing the backbone, which can be applied to different medical data without updating the model.
  • Figure 2: The architecture of the proposed effective medical visual prompting (EMVP). We use the Super-Pixel Guided Prompting (SPGP) and the Image Embedding Guided Prompting (IEGP) to tune the extracted features. The Adaptive Attention Mechanism Guided Prompting (AAGP) is designed to merge these features to focus on more effective visual prompting.
  • Figure 3: Comparison with other task-specific methods. We show the results for the Nasopharynx dataset (Top), the ESOCT dataset (Middle), the BIMR dataset (Bottom).
  • Figure : Quantitative results on Endoscopic segmentation.