DG-TTA: Out-of-domain Medical Image Segmentation through Augmentation and Descriptor-driven Domain Generalization and Test-Time Adaptation

Christian Weihsbach; Christian N. Kruse; Alexander Bigalke; Mattias P. Heinrich

DG-TTA: Out-of-domain Medical Image Segmentation through Augmentation and Descriptor-driven Domain Generalization and Test-Time Adaptation

Christian Weihsbach, Christian N. Kruse, Alexander Bigalke, Mattias P. Heinrich

TL;DR

This paper tackles the problem of degraded segmentation performance when pre-trained medical image models encounter out-of-domain data. It introduces DG-TTA, a two-stage approach that combines a domain-generalized pre-training stage using GIN augmentation and SSC descriptors with a test-time adaptation stage that enforces prediction consistency across augmented views, all within nnUNet. Across five public datasets covering abdominal, spine, and cardiac anatomy, DG-TTA yields significant cross-domain improvements in Dice and HD95, with notable gains in CT→MR scenarios and particularly strong performance when leveraging large TS pre-training data. The method remains compact, source-data friendly, and open-source, offering a practical pathway to bridge domain gaps without extensive target-domain data labeling or retraining.”

Abstract

Purpose: Applying pre-trained medical deep learning segmentation models on out-of-domain images often yields predictions of insufficient quality. In this study, we propose to use a powerful generalizing descriptor along with augmentation to enable domain-generalized pre-training and test-time adaptation, achieving high-quality segmentation in unseen domains. Materials and Methods: In this retrospective study five different publicly available datasets (2012 to 2022) including 3D CT and MRI images are used to evaluate segmentation performance in out-of-domain scenarios. The settings include abdominal, spine, and cardiac imaging. The data is randomly split into training and test samples. Domain-generalized pre-training on source data is used to obtain the best initial performance in the target domain. We introduce the combination of the generalizing SSC descriptor and GIN intensity augmentation for optimal generalization. Segmentation results are subsequently optimized at test time, where we propose to adapt the pre-trained models for every unseen scan with a consistency scheme using the same augmentation-descriptor combination. The segmentation is evaluated using Dice similarity and Hausdorff distance and the significance of improvements is tested with the Wilcoxon signed-rank test. Results: The proposed generalized pre-training and subsequent test-time adaptation improves model performance significantly in CT to MRI cross-domain prediction for abdominal (+46.2% and +28.2% Dice), spine (+72.9%), and cardiac (+14.2% and +55.7% Dice) scenarios (p<0.001). Conclusion: Our method enables optimal, independent usage of medical image source and target data and bridges domain gaps successfully with a compact and efficient methodology. Open-source code available at: https://github.com/multimodallearning/DG-TTA

DG-TTA: Out-of-domain Medical Image Segmentation through Augmentation and Descriptor-driven Domain Generalization and Test-Time Adaptation

TL;DR

Abstract

Paper Structure (22 sections, 6 figures, 4 tables)

This paper contains 22 sections, 6 figures, 4 tables.

Introduction
Materials and methods
Study design and patients
Datasets
BTCV: Multi-Atlas Labeling Beyond the Cranial Vault
AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation
MMWHS: Multi-Modality Whole Heart Segmentation
SPINE: MyoSegmenTUM spine
TS: TotalSegmentator, 104 labels
Pre-/postprocessing
Related work
Domain generalization
Test-time adaptation
Proposed method
Source domain domain-generalized pre-training
...and 7 more sections

Figures (6)

Figure 1: Study flowchart. Data from five publicly available datasets was included and combined to result in several out-of-domain CT > MR prediction scenarios (their combination is indicated by the red arrows)landman2015miccaiji2022amoswasserthal2023totalsegmentatorburian2019lumbarzhuang2019evaluation. We randomly extracted subsamples for a source and target data ratio of at least 2:1. For the MMWHS dataset, we split the training and test data to include individual patients only (no paired data across training and testing).
Figure 2: Our proposed method consists of two steps that should be combined to reach optimal performance but can generally be used independently. Both steps rely on input feature modification to improve model generalization and enable unsupervised model adaptation at test time. Left: Model pre-training with source domain data. We propose to use GIN augmentation ouyang2022causality and the SSC descriptor heinrich2013towards in this step. Right: TTA is applied in the target data domain. Two different augmented versions of the same input are passed through the pre-trained segmentation network. The network weights are then optimized, supervising the predictions with a Dice loss and steering the network to produce consistent predictions. After inverse spatial transformations, consistency masking is applied to filter non-matching regions.
Figure 3: Dice loss landscapes given scalar probability values $\hat{y}_A$ and $\hat{y}_B$ for different exponents $d=[1,2]$ in Eq. \ref{['eq:loss']}. $d=2$ yields zero loss along the diagonal, which is favorable for consistency.
Figure 4: Base (BS) and adapted model (+A) performance of several methods bridging a CT > MR domain gap in abdominal organ segmentation. In the case of the batch normalization model NNUNET BN, we evaluated adapting only the normalization layer parameters (+A-nor) or the encoder (+A-enc) additionally to evaluate the adaptation of all parameters. Ordinate shows Dice scores in %. Median (---) and mean (+) are indicated for boxes. The significance of improvement over the source NNUNET BS base model is shown above boxes (*p<0.05; **p<0.01; ***p<0.001). The right part of the figure shows a zoomed-in view of the three rightmost methods.
Figure 5: Base (BS) and adapted (+A) model performance given in Dice similarity % for several cross-domain prediction scenarios. Top row and bottom left: TS pre-trained models. Bottom right: MMWHS CT pre-trained models with only 12 training samples. Ordinate shows Dice scores in %. Median (---) and mean (+) are indicated for boxes. The significance of improvement over the source NNUNET BS base model is shown above boxes (*p<0.05; **p<0.01; ***p<0.001).
...and 1 more figures

DG-TTA: Out-of-domain Medical Image Segmentation through Augmentation and Descriptor-driven Domain Generalization and Test-Time Adaptation

TL;DR

Abstract

DG-TTA: Out-of-domain Medical Image Segmentation through Augmentation and Descriptor-driven Domain Generalization and Test-Time Adaptation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)