Table of Contents
Fetching ...

CAT: Coordinating Anatomical-Textual Prompts for Multi-Organ and Tumor Segmentation

Zhongzhen Huang, Yankai Jiang, Rongzhao Zhang, Shaoting Zhang, Xiaofan Zhang

TL;DR

CAT, an innovative model that Coordinates Anatomical prompts derived from 3D cropped images with Textual prompts enriched by medical domain knowledge is introduced, which confirms that coordinating multimodal prompts is a promising avenue for addressing complex scenarios in the medical domain.

Abstract

Existing promptable segmentation methods in the medical imaging field primarily consider either textual or visual prompts to segment relevant objects, yet they often fall short when addressing anomalies in medical images, like tumors, which may vary greatly in shape, size, and appearance. Recognizing the complexity of medical scenarios and the limitations of textual or visual prompts, we propose a novel dual-prompt schema that leverages the complementary strengths of visual and textual prompts for segmenting various organs and tumors. Specifically, we introduce CAT, an innovative model that Coordinates Anatomical prompts derived from 3D cropped images with Textual prompts enriched by medical domain knowledge. The model architecture adopts a general query-based design, where prompt queries facilitate segmentation queries for mask prediction. To synergize two types of prompts within a unified framework, we implement a ShareRefiner, which refines both segmentation and prompt queries while disentangling the two types of prompts. Trained on a consortium of 10 public CT datasets, CAT demonstrates superior performance in multiple segmentation tasks. Further validation on a specialized in-house dataset reveals the remarkable capacity of segmenting tumors across multiple cancer stages. This approach confirms that coordinating multimodal prompts is a promising avenue for addressing complex scenarios in the medical domain.

CAT: Coordinating Anatomical-Textual Prompts for Multi-Organ and Tumor Segmentation

TL;DR

CAT, an innovative model that Coordinates Anatomical prompts derived from 3D cropped images with Textual prompts enriched by medical domain knowledge is introduced, which confirms that coordinating multimodal prompts is a promising avenue for addressing complex scenarios in the medical domain.

Abstract

Existing promptable segmentation methods in the medical imaging field primarily consider either textual or visual prompts to segment relevant objects, yet they often fall short when addressing anomalies in medical images, like tumors, which may vary greatly in shape, size, and appearance. Recognizing the complexity of medical scenarios and the limitations of textual or visual prompts, we propose a novel dual-prompt schema that leverages the complementary strengths of visual and textual prompts for segmenting various organs and tumors. Specifically, we introduce CAT, an innovative model that Coordinates Anatomical prompts derived from 3D cropped images with Textual prompts enriched by medical domain knowledge. The model architecture adopts a general query-based design, where prompt queries facilitate segmentation queries for mask prediction. To synergize two types of prompts within a unified framework, we implement a ShareRefiner, which refines both segmentation and prompt queries while disentangling the two types of prompts. Trained on a consortium of 10 public CT datasets, CAT demonstrates superior performance in multiple segmentation tasks. Further validation on a specialized in-house dataset reveals the remarkable capacity of segmenting tumors across multiple cancer stages. This approach confirms that coordinating multimodal prompts is a promising avenue for addressing complex scenarios in the medical domain.
Paper Structure (18 sections, 8 equations, 6 figures, 7 tables)

This paper contains 18 sections, 8 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Left: Long-tailed curve of the category and the number of available cases that can be obtained in the medical field. Right: Tumors in different cancer staging with diverse shapes and sizes.
  • Figure 2: (a) CAT follows the query-based segmentation architecture. 3D cropped volumes according to the anatomical structure are utilized as anatomical prompts. Texts enhanced by professional knowledge are adopted as textual prompts. Learnable queries and both prompts are utilized for the final prediction via ShareRefiner and PromptRefer. (b) The case of colon tumor in Stage-IV invading the intestine. (c) Attention masks in PromptRefer for assigning specific prompts to queries.
  • Figure 3: Qualitative visualizations of the proposed model and other prompting methods on organ/tumor segmentation. The segmentation results presented from rows one to five correspond, in order, to the duodenum, liver tumors, pancreas tumors, colon tumors, and colon tumors in Stage-IV.
  • Figure 4: T-SNE visualization of the distribution of Features. Left: Two types of prompt embedding before and after refinement. Right: Segmentation query features with and without constrastive alignment. (1-9: right kidney, left kidney, liver, pancreas, colon, kidney tumor, liver tumor, pancreas tumor, colon tumor).
  • Figure 5: Heatmaps of two samples for analyzing the effectiveness of two prompts.
  • ...and 1 more figures