ZePT: Zero-Shot Pan-Tumor Segmentation via Query-Disentangling and Self-Prompting

Yankai Jiang; Zhongzhen Huang; Rongzhao Zhang; Xiaofan Zhang; Shaoting Zhang

ZePT: Zero-Shot Pan-Tumor Segmentation via Query-Disentangling and Self-Prompting

Yankai Jiang, Zhongzhen Huang, Rongzhao Zhang, Xiaofan Zhang, Shaoting Zhang

TL;DR

ZePT tackles the long-tailed, multi-organ tumor segmentation challenge by introducing a two-stage, query-disentangling framework. Stage-I learns organ-centric fundamental queries to build robust organ representations, while Stage-II uses self-generated visual prompts to guide advanced tumor queries, aided by cross-modal query-knowledge alignment with medical-domain text embeddings. The approach yields state-of-the-art zero-shot tumor segmentation on MSD and a real-world colon dataset, with strong improvements in AUROC, FPR$_{95}$, and DSC and competitive performance on seen organs. These findings highlight the practical potential of zero-shot pan-tumor segmentation in clinical settings and suggest avenues for further improvement via data augmentation, cross-modal knowledge, and modality expansion.

Abstract

The long-tailed distribution problem in medical image analysis reflects a high prevalence of common conditions and a low prevalence of rare ones, which poses a significant challenge in developing a unified model capable of identifying rare or novel tumor categories not encountered during training. In this paper, we propose a new zero-shot pan-tumor segmentation framework (ZePT) based on query-disentangling and self-prompting to segment unseen tumor categories beyond the training set. ZePT disentangles the object queries into two subsets and trains them in two stages. Initially, it learns a set of fundamental queries for organ segmentation through an object-aware feature grouping strategy, which gathers organ-level visual features. Subsequently, it refines the other set of advanced queries that focus on the auto-generated visual prompts for unseen tumor segmentation. Moreover, we introduce query-knowledge alignment at the feature level to enhance each query's discriminative representation and generalizability. Extensive experiments on various tumor segmentation tasks demonstrate the performance superiority of ZePT, which surpasses the previous counterparts and evidence the promising ability for zero-shot tumor segmentation in real-world settings.

ZePT: Zero-Shot Pan-Tumor Segmentation via Query-Disentangling and Self-Prompting

TL;DR

, and DSC and competitive performance on seen organs. These findings highlight the practical potential of zero-shot pan-tumor segmentation in clinical settings and suggest avenues for further improvement via data augmentation, cross-modal knowledge, and modality expansion.

Abstract

Paper Structure (24 sections, 9 equations, 8 figures, 14 tables)

This paper contains 24 sections, 9 equations, 8 figures, 14 tables.

Introduction
Related Work
Method
Stage-I: Fundamental Queries for Organs
Stage-II: Advanced Queries for Tumors
Experiments
Main Results
Ablation Study and Discussions
Conclusions
Appendix
Dataset Details
Qualitative Analysis on Real-World Colon Tumor Segmentation Dataset.
Detailed Results of Real-World Colon Tumor Segmentation Analysis.
Additional Ablation Experiments
Different Choices of Text Encoder.
...and 9 more sections

Figures (8)

Figure 1: (a) The long-tailed distribution issue in medical image analysis. (b) ZePT is trained on datasets containing multiple organs and tumors. During inference, ZePT can segment both seen categories (i.e. organs and tumors) and unseen tumors.
Figure 2: Overall pipeline. Stage-I: Based on MaskFormer cheng2021percheng2022masked, we propose an object-aware feature grouping strategy to train a set of fundamental queries for multi-organ segmentation. Stage-II: A set of advanced queries for tumor segmentation attend to visual prompts derived from the affinity between fundamental query embeddings and visual features which indicates the presence of unseen abnormalities. Finally, we incorporate medical domain knowledge to better align text embeddings with query embeddings for cross-modal reasoning.
Figure 3: Qualitative visualizations on MSD antonelli2022medical dataset. We compare ZePT with other advanced OVSS methods and OOD detection methods in a zero-shot manner. The segmentation results presented from rows one to four correspond, in order, to hepatic vessel tumors, lung tumors, pancreatic tumors, and colorectal tumors. We present the visualizations on other datasets in the supplemental material.
Figure 4: Visualization of query response maps. (a) A test sample containing seen categories from BTCV landman2015miccai evaluation set. (b) Two test samples, one from the MSD's pancreas tumor task antonelli2022medical and the other from the real-world colon tumor segmentation dataset. We can observe the query distribution on the different organs and tumors with obvious separation. The clear boundaries and high responses show the advantages of encouraging discriminative and disentangled queries to represent different objects, which benefits the segmentation of both seen and unseen categories.
Figure 5: Qualitative visualizations on real-world colon tumor segmentation dataset. We compare ZePT with other advanced OVSS methods and OOD detection methods in a zero-shot manner.
...and 3 more figures

ZePT: Zero-Shot Pan-Tumor Segmentation via Query-Disentangling and Self-Prompting

TL;DR

Abstract

ZePT: Zero-Shot Pan-Tumor Segmentation via Query-Disentangling and Self-Prompting

Authors

TL;DR

Abstract

Table of Contents

Figures (8)