Table of Contents
Fetching ...

M3DA: Benchmark for Unsupervised Domain Adaptation in 3D Medical Image Segmentation

Boris Shirokikh, Anvar Kurmukov, Mariia Donskova, Valentin Samokhin, Mikhail Belyaev, Ivan Oseledets

TL;DR

M3DA addresses the lack of a large, public benchmark for unsupervised domain adaptation in 3D medical image segmentation by assembling eight clinically relevant domain shifts from four public datasets (AMOS, BraTS, CC359, LIDC) into eight tasks across 22 problems. The authors establish a standardized evaluation protocol using a nnU-Net–based baseline and an oracle, and survey over a dozen UDA methods spanning discrepancy-based, self-training, adversarial, image-level, and augmentation strategies, plus foundational backbones. Their extensive experiments show that no method consistently closes the domain gap, with the best approaches achieving roughly 62% gap reduction on average, highlighting the need for novel, robust techniques and emphasizing the strong impact of generic augmentations. The work further demonstrates that M3DA supports multiple DA paradigms beyond unsupervised DA (e.g., supervised, source-free, test-time, and domain generalization) and provides a clear, public benchmark to accelerate progress toward robust, scalable 3D medical image segmentation in real-world, heterogeneous clinical settings.

Abstract

Domain shift presents a significant challenge in applying Deep Learning to the segmentation of 3D medical images from sources like Magnetic Resonance Imaging (MRI) and Computed Tomography (CT). Although numerous Domain Adaptation methods have been developed to address this issue, they are often evaluated under impractical data shift scenarios. Specifically, the medical imaging datasets used are often either private, too small for robust training and evaluation, or limited to single or synthetic tasks. To overcome these limitations, we introduce a M3DA /"mEd@/ benchmark comprising four publicly available, multiclass segmentation datasets. We have designed eight domain pairs featuring diverse and practically relevant distribution shifts. These include inter-modality shifts between MRI and CT and intra-modality shifts among various MRI acquisition parameters, different CT radiation doses, and presence/absence of contrast enhancement in images. Within the proposed benchmark, we evaluate more than ten existing domain adaptation methods. Our results show that none of them can consistently close the performance gap between the domains. For instance, the most effective method reduces the performance gap by about 62% across the tasks. This highlights the need for developing novel domain adaptation algorithms to enhance the robustness and scalability of deep learning models in medical imaging. We made our M3DA benchmark publicly available: https://github.com/BorisShirokikh/M3DA.

M3DA: Benchmark for Unsupervised Domain Adaptation in 3D Medical Image Segmentation

TL;DR

M3DA addresses the lack of a large, public benchmark for unsupervised domain adaptation in 3D medical image segmentation by assembling eight clinically relevant domain shifts from four public datasets (AMOS, BraTS, CC359, LIDC) into eight tasks across 22 problems. The authors establish a standardized evaluation protocol using a nnU-Net–based baseline and an oracle, and survey over a dozen UDA methods spanning discrepancy-based, self-training, adversarial, image-level, and augmentation strategies, plus foundational backbones. Their extensive experiments show that no method consistently closes the domain gap, with the best approaches achieving roughly 62% gap reduction on average, highlighting the need for novel, robust techniques and emphasizing the strong impact of generic augmentations. The work further demonstrates that M3DA supports multiple DA paradigms beyond unsupervised DA (e.g., supervised, source-free, test-time, and domain generalization) and provides a clear, public benchmark to accelerate progress toward robust, scalable 3D medical image segmentation in real-world, heterogeneous clinical settings.

Abstract

Domain shift presents a significant challenge in applying Deep Learning to the segmentation of 3D medical images from sources like Magnetic Resonance Imaging (MRI) and Computed Tomography (CT). Although numerous Domain Adaptation methods have been developed to address this issue, they are often evaluated under impractical data shift scenarios. Specifically, the medical imaging datasets used are often either private, too small for robust training and evaluation, or limited to single or synthetic tasks. To overcome these limitations, we introduce a M3DA /"mEd@/ benchmark comprising four publicly available, multiclass segmentation datasets. We have designed eight domain pairs featuring diverse and practically relevant distribution shifts. These include inter-modality shifts between MRI and CT and intra-modality shifts among various MRI acquisition parameters, different CT radiation doses, and presence/absence of contrast enhancement in images. Within the proposed benchmark, we evaluate more than ten existing domain adaptation methods. Our results show that none of them can consistently close the performance gap between the domains. For instance, the most effective method reduces the performance gap by about 62% across the tasks. This highlights the need for developing novel domain adaptation algorithms to enhance the robustness and scalability of deep learning models in medical imaging. We made our M3DA benchmark publicly available: https://github.com/BorisShirokikh/M3DA.

Paper Structure

This paper contains 36 sections, 1 equation, 7 figures, 11 tables.

Figures (7)

  • Figure 1: Using the best DA method in the M3DA benchmark closes only 62% of the performance gap between domains on average. Here, % indicates the gap closed between the baseline level and oracle (outer) circle.
  • Figure 2: Examples from individual domains in M3DA without segmentation masks for visual comparison between domains. Left to right, top to bottom: CT to MR, CT to LDCT, CT CE to CT native, CE T1 to T1, T1 Field (1.5T to 3T), T1 Scanner (Philips to Siemens). We provide segmentation masks visualization for the same examples in Supplementary materials.
  • Figure 3: Overview of the UDA pipeline for semantic segmentation. Some methods does not require Target Domain images during training, e.g., nnAugm, IN, AdaBN.
  • Figure 4: Comparison of DA methods with and without augmentations.
  • Figure 5: Examples from individual domains in M3DA with the corresponding segmentation masks. Left to right, top to bottom: CT to MR, CT to LDCT, CT CE to CT native, CE T1 to T1, T1 Field (1.5T to 3T), T1 Scanner (Philips to Siemens). Different colors correspond to different segmentation classes.
  • ...and 2 more figures