Table of Contents
Fetching ...

RoMedFormer: A Rotary-Embedding Transformer Foundation Model for 3D Genito-Pelvic Structure Segmentation in MRI and CT

Yuheng Li, Mingzhe Hu, Richard L. J. Qiu, Maria Thor, Andre Williams, Deborah Marshall, Xiaofeng Yang

TL;DR

The paper tackles the underrepresented task of genito-pelvic structure segmentation across MRI and CT, a critical step for accurate pelvic radiotherapy planning. It introduces RoMedFormer, a rotary-embedding transformer foundation model trained through a multi-stage pipeline that combines self-supervised pretraining on large unlabeled CT datasets with supervised fine-tuning on TotalSegmentator and AMOS22, followed by task-specific adaptation to genito-pelvic anatomy and multi-modality support. The approach employs 3D patch embeddings, Rotary Positional Embeddings, SwiGLU blocks, and a lightweight decoder to capture complex spatial relationships while maintaining efficiency. Results show competitive segmentation performance across diverse genito-pelvic structures, with strong accuracy for genitals and neurovascular bundles and areas for improvement in smaller or low-contrast tissues, underscoring the potential of transformer-based foundation models for clinically impactful, multimodal medical image segmentation.

Abstract

Deep learning-based segmentation of genito-pelvic structures in MRI and CT is crucial for applications such as radiation therapy, surgical planning, and disease diagnosis. However, existing segmentation models often struggle with generalizability across imaging modalities, and anatomical variations. In this work, we propose RoMedFormer, a rotary-embedding transformer-based foundation model designed for 3D female genito-pelvic structure segmentation in both MRI and CT. RoMedFormer leverages self-supervised learning and rotary positional embeddings to enhance spatial feature representation and capture long-range dependencies in 3D medical data. We pre-train our model using a diverse dataset of 3D MRI and CT scans and fine-tune it for downstream segmentation tasks. Experimental results demonstrate that RoMedFormer achieves superior performance segmenting genito-pelvic organs. Our findings highlight the potential of transformer-based architectures in medical image segmentation and pave the way for more transferable segmentation frameworks.

RoMedFormer: A Rotary-Embedding Transformer Foundation Model for 3D Genito-Pelvic Structure Segmentation in MRI and CT

TL;DR

The paper tackles the underrepresented task of genito-pelvic structure segmentation across MRI and CT, a critical step for accurate pelvic radiotherapy planning. It introduces RoMedFormer, a rotary-embedding transformer foundation model trained through a multi-stage pipeline that combines self-supervised pretraining on large unlabeled CT datasets with supervised fine-tuning on TotalSegmentator and AMOS22, followed by task-specific adaptation to genito-pelvic anatomy and multi-modality support. The approach employs 3D patch embeddings, Rotary Positional Embeddings, SwiGLU blocks, and a lightweight decoder to capture complex spatial relationships while maintaining efficiency. Results show competitive segmentation performance across diverse genito-pelvic structures, with strong accuracy for genitals and neurovascular bundles and areas for improvement in smaller or low-contrast tissues, underscoring the potential of transformer-based foundation models for clinically impactful, multimodal medical image segmentation.

Abstract

Deep learning-based segmentation of genito-pelvic structures in MRI and CT is crucial for applications such as radiation therapy, surgical planning, and disease diagnosis. However, existing segmentation models often struggle with generalizability across imaging modalities, and anatomical variations. In this work, we propose RoMedFormer, a rotary-embedding transformer-based foundation model designed for 3D female genito-pelvic structure segmentation in both MRI and CT. RoMedFormer leverages self-supervised learning and rotary positional embeddings to enhance spatial feature representation and capture long-range dependencies in 3D medical data. We pre-train our model using a diverse dataset of 3D MRI and CT scans and fine-tune it for downstream segmentation tasks. Experimental results demonstrate that RoMedFormer achieves superior performance segmenting genito-pelvic organs. Our findings highlight the potential of transformer-based architectures in medical image segmentation and pave the way for more transferable segmentation frameworks.

Paper Structure

This paper contains 13 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: Overview of the Multi-Stage Learning Strategy for Genito-Pelvic Structure Segmentation. The model undergoes three progressive training stages: (1) Self-supervised pretraining, where it learns general anatomical representations from large-scale CT datasets using masked image modeling; (2) Supervised fine-tuning, leveraging annotated CT dataset such as TotalSegmentator and cross-modality CT-MRI dataset AMOS22 to refine segmentation accuracy across multiple organ systems; and (3) Task-specific fine-tuning, where the model is further adapted to genito-pelvic segmentation using a dedicated dataset. Weight transfer between stages enables efficient feature learning and adaptation.
  • Figure 2: Overview of our segmentation model design.
  • Figure 3: Visualizations of segmentation results on GYN. Column 1 and 2 shows model segmentations on MRI images and column 3 and 4 shows segmentations on CT. Dark Blue: Bulboclitoris. Light Blue: Genitals. Orange: Left Neurovascular Bundle (Internal Pudendal). Green: Right Neurovascular Bundle (Internal Pudendal). Light Green: Urethra. Red: Paraurethral Gland. Pink: Left Ovary. Light Pink: Right Ovary. Purple: Left Neurovascular Bundle (Inferior Hypogastric). Brown: Right Neurovascular Bundle (Inferior Hypogastric).