Table of Contents
Fetching ...

PAM: A Propagation-Based Model for Segmenting Any 3D Objects across Multi-Modal Medical Images

Zifan Chen, Xinyu Nan, Jiazheng Li, Jie Zhao, Haifeng Li, Ziling Lin, Haoshen Li, Heyun Chen, Yiting Liu, Lei Tang, Li Zhang, Bin Dong

TL;DR

By delivering accurate 3D segmentations from minimal input, PAM lowers reliance on manual annotation and task-specific training, providing an efficient and generalizable tool for automated clinical imaging.

Abstract

Volumetric segmentation is important in medical imaging, but current methods face challenges like requiring lots of manual annotations and being tailored to specific tasks, which limits their versatility. General segmentation models used for natural images don't perform well with the unique features of medical images. There's a strong need for an adaptable approach that can effectively handle different 3D medical structures and imaging modalities. In this study, we present PAM (Propagating Anything Model), a segmentation approach that uses a 2D prompt, like a bounding box or sketch, to create a complete 3D segmentation of medical image volumes. PAM works by modeling relationships between slices, maintaining information flow across the 3D structure. It combines a CNN-based UNet for processing within slices and a Transformer-based attention module for propagating information between slices, leading to better generalizability across various imaging modalities. PAM significantly outperformed existing models like MedSAM and SegVol, with an average improvement of over 18.1% in dice similarity coefficient (DSC) across 44 medical datasets and various object types. It also showed stable performance despite prompt deviations and different propagation setups, and faster inference speeds compared to other models. PAM's one-view prompt design made it more efficient, reducing interaction time by about 63.6% compared to two-view prompts. Thanks to its focus on structural relationships, PAM handled unseen and complex objects well, showing a unique ability to generalize to new situations. PAM represents an advancement in medical image segmentation, effectively reducing the need for extensive manual work and specialized training. Its adaptability makes it a promising tool for more automated and reliable analysis in clinical settings.

PAM: A Propagation-Based Model for Segmenting Any 3D Objects across Multi-Modal Medical Images

TL;DR

By delivering accurate 3D segmentations from minimal input, PAM lowers reliance on manual annotation and task-specific training, providing an efficient and generalizable tool for automated clinical imaging.

Abstract

Volumetric segmentation is important in medical imaging, but current methods face challenges like requiring lots of manual annotations and being tailored to specific tasks, which limits their versatility. General segmentation models used for natural images don't perform well with the unique features of medical images. There's a strong need for an adaptable approach that can effectively handle different 3D medical structures and imaging modalities. In this study, we present PAM (Propagating Anything Model), a segmentation approach that uses a 2D prompt, like a bounding box or sketch, to create a complete 3D segmentation of medical image volumes. PAM works by modeling relationships between slices, maintaining information flow across the 3D structure. It combines a CNN-based UNet for processing within slices and a Transformer-based attention module for propagating information between slices, leading to better generalizability across various imaging modalities. PAM significantly outperformed existing models like MedSAM and SegVol, with an average improvement of over 18.1% in dice similarity coefficient (DSC) across 44 medical datasets and various object types. It also showed stable performance despite prompt deviations and different propagation setups, and faster inference speeds compared to other models. PAM's one-view prompt design made it more efficient, reducing interaction time by about 63.6% compared to two-view prompts. Thanks to its focus on structural relationships, PAM handled unseen and complex objects well, showing a unique ability to generalize to new situations. PAM represents an advancement in medical image segmentation, effectively reducing the need for extensive manual work and specialized training. Its adaptability makes it a promising tool for more automated and reliable analysis in clinical settings.
Paper Structure (26 sections, 6 equations, 6 figures)

This paper contains 26 sections, 6 equations, 6 figures.

Figures (6)

  • Figure 1: PAM is designed for segmenting any 3D objects within various multi-modal 3D medical imaging data.a PAM receives any 3D medical imaging data as input, with users (typically doctors) specifying the target objects for segmentation through prompts. This enables precise and efficient volumetric segmentation of diverse 3D objects, thereby aiding users in enhancing the efficiency of medical analysis and diagnostics. b Type I model: receives a 3D box prompt, predicts each 2D slice using a 2D model, and merges these 2D outcomes into a consolidated 3D prediction. c Type II model: receives a 3D box prompt, predicts each 3D patch using a 3D model, and integrates these patch results into a comprehensive 3D prediction result. d Type III model (ours): receives a 2D box or mask prompt, employs a propagation model to disseminate prompt knowledge throughout across the entire 3D space, resulting in a unified 3D prediction.
  • Figure 2: Workflow and inference process of the propagation-based segment any 3D objects model (PAM).a User interaction: users upload a 3D medical image and specify the segmentation target using either a bounding box (style 1), in accordance with response evaluatin criteria in solid tumors (RECIST) guidelines, or a 2D mask applied to the largest slice of the target object (style 2). A bounding box is transformed into a 2D mask by the Box2Mask module for standardized processing in the PropMask module. b PropMask module: this module conducts volumetric segmentation by propagating information between slices. It begins with the 2D mask and its corresponding image slice (the guiding prompt and slice). Adjacent slices are the targets for segmentation. Image features (K and Q) are extracted from the guiding and adjacent slices, respectively, using a shared image encoder. The guiding prompt is converted into multi-scale features (V) through a mask encoder. These features, along with skip connection features from adjacent slices, are assimilated in a prompt-guided decoder to facilitate volumetric segmentation, leveraging the propagation of prompt content across slices. c PAM inference: the user provides a guiding slice and prompt. PAM then propagates the prompt information bidirectionally across slices (yellow arrows). This propagation continues until the boundaries of the 3D image are reached or there is no further content to predict, achieving precise volumetric segmentation.
  • Figure 3: Data characteristic across various datasets.a A circular barplot illustrates the range of data modalities and validation splits across multiple datasets. The innermost ring uses distinct colors to represent different medical imaging modalities (orange for CT; blue for MR; yellow for PET-CT; green for SRX). The second ring differentiates between internal and external datasets, with darker shades indicating internal datasets and lighter shades representing external datasets. The outermost layer displays a bar chart that showcases the distribution of segmented object types across the datasets, with quantities log-scaled for optimal visualization. b Data fingerprints exhibit the key properties of the 44 datasets used in this study (displayed with z-score normalization over all datasets on a scale of one standard deviation around the mean). see Supplementary Tables S1–S4 for details.
  • Figure 4: Quantitative analysis of PAM across various datasets.a Radar chart comparisons of Dice Similarity Coefficient (DSC) among four segmentation models—MedSAM (yellow), SegVol (green), PAM-2DBox (blue), and PAM-2DMask (red)—across internal and external datasets. Each radial axis represents one of the 44 datasets used (D1–D44), with DSC values ranging from 0.0 to 1.0, moving from the center outward. b Comparison of inference times (seconds). The left side features a box plot illustrating the distribution of inference times for the four models across 44 datasets. The right side visualizes a comparative analysis of the inference times for each model across these datasets, where the vertical axis represents inference time, and shorter bars indicate faster inference speeds. c Comparison of manual prompt times (seconds). A box plot depicts the distribution of interactive prompt times for three distinct prompt types. Light grey represents the commonly used two-view box prompt (MedSAM and SegVol), medium grey denotes the one-view box prompt of PAM, and dark grey signifies the one-view mask prompt of PAM. The vertical axis indicates the prompt times. d–e Ablation study on the impact of initialization slice deviation. Blue and orange colors represent the PAM-2DBox and PAM-2DMask models, respectively. The box plots show the distribution of DSCs plotted against initialization slice deviation from the RECIST-annotated maximum slice, with deviations of 0% (no deviation), ±5%, ±10%, ±15%, and ±20%. f–g Ablation study on the impact of propagation slice thickness. The box plots display the distribution of DSCs plotted against propagation thickness of 10 mm, 20 mm, 30 mm, and 40 mm.
  • Figure 5: Qualitative analysis and the relationship between object shape and performance.a Comparison of segmentation results across various models. From left to right, the columns represent ground truth, MedSAM, SegVol, PAM-2DBox, and PAM-2DMask, respectively. b DSC change analysis for PAM relative to MedSAM across various box ratios. This plot displays DSC changes, where blue indicates PAM-2DBox and red denotes PAM-2DMask. Each point represents a different object, highlighting the model's adaptability to varying box ratios. c DSC change analysis for PAM relative to MedSAM across various convex ratios. d DSC change analysis for PAM relative to MedSAM across various inverse rotational inertia (IRI). e Comparative visualization of segmentation results for a sample with a low box ratio. The black arrow indicates an irregular area.
  • ...and 1 more figures