Table of Contents
Fetching ...

TP-DRSeg: Improving Diabetic Retinopathy Lesion Segmentation with Explicit Text-Prompts Assisted SAM

Wenxue Li, Xinyu Xiong, Peng Xia, Lie Ju, Zongyuan Ge

TL;DR

A novel framework that customizes SAM for text-prompted DR lesion segmentation, termed TP-DRSeg, and proposes an explicit prior encoder that transfers implicit medical concepts into explicit prior knowledge, providing explainable clues to excavate low-level features associated with lesions.

Abstract

Recent advances in large foundation models, such as the Segment Anything Model (SAM), have demonstrated considerable promise across various tasks. Despite their progress, these models still encounter challenges in specialized medical image analysis, especially in recognizing subtle inter-class differences in Diabetic Retinopathy (DR) lesion segmentation. In this paper, we propose a novel framework that customizes SAM for text-prompted DR lesion segmentation, termed TP-DRSeg. Our core idea involves exploiting language cues to inject medical prior knowledge into the vision-only segmentation network, thereby combining the advantages of different foundation models and enhancing the credibility of segmentation. Specifically, to unleash the potential of vision-language models in the recognition of medical concepts, we propose an explicit prior encoder that transfers implicit medical concepts into explicit prior knowledge, providing explainable clues to excavate low-level features associated with lesions. Furthermore, we design a prior-aligned injector to inject explicit priors into the segmentation process, which can facilitate knowledge sharing across multi-modality features and allow our framework to be trained in a parameter-efficient fashion. Experimental results demonstrate the superiority of our framework over other traditional models and foundation model variants.

TP-DRSeg: Improving Diabetic Retinopathy Lesion Segmentation with Explicit Text-Prompts Assisted SAM

TL;DR

A novel framework that customizes SAM for text-prompted DR lesion segmentation, termed TP-DRSeg, and proposes an explicit prior encoder that transfers implicit medical concepts into explicit prior knowledge, providing explainable clues to excavate low-level features associated with lesions.

Abstract

Recent advances in large foundation models, such as the Segment Anything Model (SAM), have demonstrated considerable promise across various tasks. Despite their progress, these models still encounter challenges in specialized medical image analysis, especially in recognizing subtle inter-class differences in Diabetic Retinopathy (DR) lesion segmentation. In this paper, we propose a novel framework that customizes SAM for text-prompted DR lesion segmentation, termed TP-DRSeg. Our core idea involves exploiting language cues to inject medical prior knowledge into the vision-only segmentation network, thereby combining the advantages of different foundation models and enhancing the credibility of segmentation. Specifically, to unleash the potential of vision-language models in the recognition of medical concepts, we propose an explicit prior encoder that transfers implicit medical concepts into explicit prior knowledge, providing explainable clues to excavate low-level features associated with lesions. Furthermore, we design a prior-aligned injector to inject explicit priors into the segmentation process, which can facilitate knowledge sharing across multi-modality features and allow our framework to be trained in a parameter-efficient fashion. Experimental results demonstrate the superiority of our framework over other traditional models and foundation model variants.
Paper Structure (12 sections, 4 equations, 4 figures, 3 tables)

This paper contains 12 sections, 4 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: (a) Comparison of existing methods and our proposed method. (b) Class activation maps generate by CLIP clip. Bottom left utilizes text embedding from implicit class name (hard exudate). Bottom right employs text embedding from explicit description of lesions (yellowish-white deposits).
  • Figure 2: Overview of the proposed framework. The explicit prior encoder extracts explainable clues and generate prior knowledge for segmentation. This explicit prior is then fed in to the prior-aligned injector (Injector) for injecting prior knowledge to feature encoding process. The class-specific prompt generator produces the segmentation map according to the provided text-based class.
  • Figure 3: The detailed architectures of (a) prior-aligned injector (Injector) and (b) class-specific prompt generator.
  • Figure 4: Qualitative comparison and visualized feature maps with and without the integration of explicit prior.