Developing Foundation Models for Universal Segmentation from 3D Whole-Body Positron Emission Tomography

Yichi Zhang; Le Xue; Wenbo Zhang; Lanlan Li; Feiyang Xiao; Yuchen Liu; Xiaohui Zhang; Hongwei Zhang; Shuqi Wang; Gang Feng; Liling Peng; Xin Gao; Yuanfan Xu; Yuan Qi; Kuangyu Shi; Hong Zhang; Yuan Cheng; Mei Tian; Zixin Hu

Developing Foundation Models for Universal Segmentation from 3D Whole-Body Positron Emission Tomography

Yichi Zhang, Le Xue, Wenbo Zhang, Lanlan Li, Feiyang Xiao, Yuchen Liu, Xiaohui Zhang, Hongwei Zhang, Shuqi Wang, Gang Feng, Liling Peng, Xin Gao, Yuanfan Xu, Yuan Qi, Kuangyu Shi, Hong Zhang, Yuan Cheng, Mei Tian, Zixin Hu

Abstract

Positron emission tomography (PET) is a key nuclear medicine imaging modality that visualizes radiotracer distributions to quantify in vivo physiological and metabolic processes, playing an irreplaceable role in disease management. Despite its clinical importance, the development of deep learning models for quantitative PET image analysis remains severely limited, driven by both the inherent segmentation challenge from PET's paucity of anatomical contrast and the high costs of data acquisition and annotation. To bridge this gap, we develop generalist foundational models for universal segmentation from 3D whole-body PET imaging. We first build the largest and most comprehensive PET dataset to date, comprising 11041 3D whole-body PET scans with 59831 segmentation masks for model development. Based on this dataset, we present SegAnyPET, an innovative foundational model with general-purpose applicability to diverse segmentation tasks. Built on a 3D architecture with a prompt engineering strategy for mask generation, SegAnyPET enables universal and scalable organ and lesion segmentation, supports efficient human correction with minimal effort, and enables a clinical human-in-the-loop workflow. Extensive evaluations on multi-center, multi-tracer, multi-disease datasets demonstrate that SegAnyPET achieves strong zero-shot performance across a wide range of segmentation tasks, highlighting its potential to advance the clinical applications of molecular imaging.

Developing Foundation Models for Universal Segmentation from 3D Whole-Body Positron Emission Tomography

Abstract

Paper Structure (17 sections, 3 equations, 7 figures, 5 tables)

This paper contains 17 sections, 3 equations, 7 figures, 5 tables.

Introduction
Results
Discussion
Methods

Figures (7)

Figure 1: Figure 1 : Overview of dataset construction, model development and validation in our study. (a) We construct PETWB-Seg11K, a large-scale whole-body PET scans from multi-tracer, multi-vendor, and multi-disease cohorts, with diverse heterogeneity in data distribution. (b) We develop SegAnyPET, a foundation model for universal volumetric PET segmentation. It takes volumetric PET input, encodes features via Image Encoder, and combines with sparse or dense prompts encoded by Prompt Encoder, then outputs segmentation through Mask Decoder. SegAnyPET supports a human-in-the-loop workflow with interactive prompting for rectification of segmentation results. (c) Comprehensive evaluation on in-distribution segmentation tasks validates the consistent and reliable performance of SegAnyPET on data with similar distribution to the training set. (d) Extensive assessment on out-of-distribution segmentation tasks demonstrates the generalization capability of SegAnyPET on unseen data scenarios in cross-center and cross-tracer settings.
Figure 2: Figure 2: Comparison of SegAnyPET with task-specific segmentation models on internal evaluation for multi-organ segmentation. (a) Quantitative performance comparison with four representative task-specific segmentation models. The center line within each box indicates the median value; the bottom and top bounds indicate the $25_{th}$ percentiles and $75_{th}$ percentiles, respectively. Outlier classes are plotted as individual dots. (b) Performance comparison between SegAnyPET and the most competitive task-specific model nnU-Net on five training visible target organs. (c) Compared with task-specific models that are trained for the segmentation of training visible targets, SegAnyPET retains the generalization capability to new targets. In contrast, additional annotation and training are required for task-specific models when adapted to training invisible targets.
Figure 3: Figure 3: Quantitative and qualitative evaluation of SegAnyPET with other foundation models for promptable PET segmentation. (a) Illustration of slice-wise and volumetric prompting strategies using 2D and 3D segmentation foundation models. For 2D models (e.g., SAM, MedSAM), the PET volume is decomposed into individual slices, where point prompts are provided on selected slices and the resulting 2D predictions are aggregated to form a 3D segmentation. In contrast, 3D foundation models (e.g., SegAnyPET) directly operate on the entire volumetric input and perform end-to-end volumetric segmentation. (b) Quantitative performance comparison of different promptable segmentation foundation models across multiple internal PET segmentation tasks. The DSC score is calculated in a volume-wise manner.
Figure 4: Figure 4: Comprehensive evaluation of generalization capability in external cohorts and clinical utility in downstreeam applications. (a) Quantitative comparison of model generalization performance on out-of-distribution datasets with unseen targer organs, modality variations, and novel radiotracer uptake patterns. (b) Evaluation of clinical utility in terms of annotation efficiency. In both lymphoma and lung cancer scenarios, the SegAnyPET-assisted interactive workflow significantly reduces the manual burden compared to conventional manual delineation. (c) Whole-body metabolic covariance network built from SegAnyPET segmentation results. The network verifies that SegAnyPET outputs have high biological fidelity, enabling robust downstream metabolic network analysis for systemic disease clinical research.
Figure 5: Figure 5: Comparison of point-prompt interaction efficacy between continuous organs and distributed pathological lesions. (a) Segmentation of continuous holistic organs. For such single-entity structures, precise and comprehensive segmentation can be efficiently achieved using only one point prompt. (b) Segmentation of disseminated lesion like lymphoma. In contrast to organs, systemic oncological findings frequently present as multiple, spatially discrete lesions across the whole body. This example illustrates the practical constraints of the point-based paradigm, demonstrating that a limited number of point prompts are insufficient to simultaneously capture the entire distributed tumor burden. (c) Practical interactive workflow for disseminated lesions. Since a single prompt cannot capture all spatially separated tumor sites, each lesion is individually prompted and segmented in a sequential lesion-wise manner.
...and 2 more figures

Developing Foundation Models for Universal Segmentation from 3D Whole-Body Positron Emission Tomography

Abstract

Developing Foundation Models for Universal Segmentation from 3D Whole-Body Positron Emission Tomography

Authors

Abstract

Table of Contents

Figures (7)