Table of Contents
Fetching ...

Zero-shot System for Automatic Body Region Detection for Volumetric CT and MR Images

Farnaz Khun Jush, Grit Werner, Mark Klemens, Matthias Lenga

TL;DR

This work tackles the challenge of identifying body regions in volumetric CT and MR images without relying on unreliable DICOM metadata. It evaluates three zero-shot pipelines that repurpose pre-trained segmentation models and general multimodal language models: (i) a segmentation-driven rule-based method, (ii) a purely MLLM-based approach guided by radiologist rules, and (iii) a segmentation-aware MLLM that fuses visual input with explicit anatomical evidence. On a dataset of 887 heterogeneous scans with radiologist-verified labels, the segmentation-driven rule-based approach achieves the strongest performance (CT F1 up to 0.984 for chest and 0.996 for pelvis; MR head F1 up to 0.931), demonstrating robust generalization across modalities and atypical coverage. The MLLM-based method shows competitive performance in visually distinctive regions but suffers from context-sensitive failures, while the segmentation-aware MLLM under zero-shot constraints often harms precision despite high recall. Overall, the study establishes a practical and interpretable metadata-free solution for body-region detection, highlighting the strengths and current limitations of leveraging foundation models in medical imaging workflows.

Abstract

Reliable identification of anatomical body regions is a prerequisite for many automated medical imaging workflows, yet existing solutions remain heavily dependent on unreliable DICOM metadata. Current solutions mainly use supervised learning, which limits their applicability in many real-world scenarios. In this work, we investigate whether body region detection in volumetric CT and MR images can be achieved in a fully zero-shot manner by using knowledge embedded in large pre-trained foundation models. We propose and systematically evaluate three training-free pipelines: (1) a segmentation-driven rule-based system leveraging pre-trained multi-organ segmentation models, (2) a Multimodal Large Language Model (MLLM) guided by radiologist-defined rules, and (3) a segmentation-aware MLLM that combines visual input with explicit anatomical evidence. All methods are evaluated on 887 heterogeneous CT and MR scans with manually verified anatomical region labels. The segmentation-driven rule-based approach achieves the strongest and most consistent performance, with weighted F1-scores of 0.947 (CT) and 0.914 (MR), demonstrating robustness across modalities and atypical scan coverage. The MLLM performs competitively in visually distinctive regions, while the segmentation-aware MLLM reveals fundamental limitations.

Zero-shot System for Automatic Body Region Detection for Volumetric CT and MR Images

TL;DR

This work tackles the challenge of identifying body regions in volumetric CT and MR images without relying on unreliable DICOM metadata. It evaluates three zero-shot pipelines that repurpose pre-trained segmentation models and general multimodal language models: (i) a segmentation-driven rule-based method, (ii) a purely MLLM-based approach guided by radiologist rules, and (iii) a segmentation-aware MLLM that fuses visual input with explicit anatomical evidence. On a dataset of 887 heterogeneous scans with radiologist-verified labels, the segmentation-driven rule-based approach achieves the strongest performance (CT F1 up to 0.984 for chest and 0.996 for pelvis; MR head F1 up to 0.931), demonstrating robust generalization across modalities and atypical coverage. The MLLM-based method shows competitive performance in visually distinctive regions but suffers from context-sensitive failures, while the segmentation-aware MLLM under zero-shot constraints often harms precision despite high recall. Overall, the study establishes a practical and interpretable metadata-free solution for body-region detection, highlighting the strengths and current limitations of leveraging foundation models in medical imaging workflows.

Abstract

Reliable identification of anatomical body regions is a prerequisite for many automated medical imaging workflows, yet existing solutions remain heavily dependent on unreliable DICOM metadata. Current solutions mainly use supervised learning, which limits their applicability in many real-world scenarios. In this work, we investigate whether body region detection in volumetric CT and MR images can be achieved in a fully zero-shot manner by using knowledge embedded in large pre-trained foundation models. We propose and systematically evaluate three training-free pipelines: (1) a segmentation-driven rule-based system leveraging pre-trained multi-organ segmentation models, (2) a Multimodal Large Language Model (MLLM) guided by radiologist-defined rules, and (3) a segmentation-aware MLLM that combines visual input with explicit anatomical evidence. All methods are evaluated on 887 heterogeneous CT and MR scans with manually verified anatomical region labels. The segmentation-driven rule-based approach achieves the strongest and most consistent performance, with weighted F1-scores of 0.947 (CT) and 0.914 (MR), demonstrating robustness across modalities and atypical scan coverage. The MLLM performs competitively in visually distinctive regions, while the segmentation-aware MLLM reveals fundamental limitations.
Paper Structure (15 sections, 5 figures, 5 tables)

This paper contains 15 sections, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Example anatomical regions delineated on a whole body MR Image (sagittal and coronal view).
  • Figure 2: Example anatomical regions delineated on a whole body CT Image (sagittal and coronal view).
  • Figure 3: Overview of Zero-shot Segmentation–driven Rule-based approach
  • Figure 4: Overview of Zero-shot MLLM approach
  • Figure 5: Overview of Segmentation-aware MLLM approach