Table of Contents
Fetching ...

SAMCT: Segment Any CT Allowing Labor-Free Task-Indicator Prompts

Xian Lin, Yangyang Xiang, Zhehao Wang, Kwang-Ting Cheng, Zengqiang Yan, Li Yu

TL;DR

This paper proposes a powerful foundation model SAMCT allowing labor-free prompts and train it on a collected large CT dataset consisting of 1.1M CT images and 5M masks from public datasets and demonstrates the superiority of SAMCT against the state-of-the-art task-specific and SAM-based medical foundation models on various tasks.

Abstract

Segment anything model (SAM), a foundation model with superior versatility and generalization across diverse segmentation tasks, has attracted widespread attention in medical imaging. However, it has been proved that SAM would encounter severe performance degradation due to the lack of medical knowledge in training and local feature encoding. Though several SAM-based models have been proposed for tuning SAM in medical imaging, they still suffer from insufficient feature extraction and highly rely on high-quality prompts. In this paper, we construct a large CT dataset consisting of 1.1M CT images and 5M masks from public datasets and propose a powerful foundation model SAMCT allowing labor-free prompts. Specifically, based on SAM, SAMCT is further equipped with a U-shaped CNN image encoder, a cross-branch interaction module, and a task-indicator prompt encoder. The U-shaped CNN image encoder works in parallel with the ViT image encoder in SAM to supplement local features. Cross-branch interaction enhances the feature expression capability of the CNN image encoder and the ViT image encoder by exchanging global perception and local features from one to the other. The task-indicator prompt encoder is a plug-and-play component to effortlessly encode task-related indicators into prompt embeddings. In this way, SAMCT can work in an automatic manner in addition to the semi-automatic interactive strategy in SAM. Extensive experiments demonstrate the superiority of SAMCT against the state-of-the-art task-specific and SAM-based medical foundation models on various tasks. The code, data, and models are released at https://github.com/xianlin7/SAMCT.

SAMCT: Segment Any CT Allowing Labor-Free Task-Indicator Prompts

TL;DR

This paper proposes a powerful foundation model SAMCT allowing labor-free prompts and train it on a collected large CT dataset consisting of 1.1M CT images and 5M masks from public datasets and demonstrates the superiority of SAMCT against the state-of-the-art task-specific and SAM-based medical foundation models on various tasks.

Abstract

Segment anything model (SAM), a foundation model with superior versatility and generalization across diverse segmentation tasks, has attracted widespread attention in medical imaging. However, it has been proved that SAM would encounter severe performance degradation due to the lack of medical knowledge in training and local feature encoding. Though several SAM-based models have been proposed for tuning SAM in medical imaging, they still suffer from insufficient feature extraction and highly rely on high-quality prompts. In this paper, we construct a large CT dataset consisting of 1.1M CT images and 5M masks from public datasets and propose a powerful foundation model SAMCT allowing labor-free prompts. Specifically, based on SAM, SAMCT is further equipped with a U-shaped CNN image encoder, a cross-branch interaction module, and a task-indicator prompt encoder. The U-shaped CNN image encoder works in parallel with the ViT image encoder in SAM to supplement local features. Cross-branch interaction enhances the feature expression capability of the CNN image encoder and the ViT image encoder by exchanging global perception and local features from one to the other. The task-indicator prompt encoder is a plug-and-play component to effortlessly encode task-related indicators into prompt embeddings. In this way, SAMCT can work in an automatic manner in addition to the semi-automatic interactive strategy in SAM. Extensive experiments demonstrate the superiority of SAMCT against the state-of-the-art task-specific and SAM-based medical foundation models on various tasks. The code, data, and models are released at https://github.com/xianlin7/SAMCT.
Paper Structure (19 sections, 4 equations, 7 figures, 9 tables)

This paper contains 19 sections, 4 equations, 7 figures, 9 tables.

Figures (7)

  • Figure 1: Overview of SAMCT. Modules painted in green, light red, and purple are the U-shaped CNN image encoder, the cross-branch interaction module, and the task-indicator prompt encoder respectively. All modules from SAM are painted in blue and frozen during training. CLG represents the combination of convolution, layer normalization, and Gelu. Trans.Conv. represents transpose convolution.
  • Figure 2: Two flows of the cross-branch interaction module.
  • Figure 3: Details of the task-indicator prompt encoder.
  • Figure 4: Comparison between SAMCT and SAM across 21 training visible datasets (versatility) and 9 training invisible datasets (generalization). SAMCT consistently outperforms SAM with large margins.
  • Figure 5: Dice boxplot of SAMCT on visible and invisible datasets across 39 objects. Objects are represented by groups of category letters and ID numbers, and the object mapping table is provided in Table \ref{['tab3']}.
  • ...and 2 more figures