Table of Contents
Fetching ...

Test-Time Modality Generalization for Medical Image Segmentation

Ju-Hyeon Nam, Sang-Chul Lee

TL;DR

The paper tackles the problem of medical image segmentation generalization to unseen modalities. It introduces Test-Time Modality Generalization (TTMG), a framework combining Modality-Aware Style Projection (MASP) and Modality-Sensitive Instance Whitening (MSIW) to map unseen modality features to seen modality style distributions and to selectively whiten modality-sensitive covariances. MASP leverages a modality classifier and a prototype-style bases bank to project features, while MSIW computes covariance variance to isolate modality-sensitive channels and applies targeted whitening, all trained with a multi-term loss that preserves content and modality distinctions. Empirical results across 11 datasets spanning colonoscopy, ultrasound, dermoscopy, and radiology show that TTMG achieves superior unseen-modality segmentation without extra training, outperforming traditional DG methods and maintaining efficiency. The work highlights a practical, scalable solution for deploying a single model across diverse clinical settings, potentially reducing data curation and retraining needs in real-world medical imaging tasks.

Abstract

Generalizable medical image segmentation is essential for ensuring consistent performance across diverse unseen clinical settings. However, existing methods often overlook the capability to generalize effectively across arbitrary unseen modalities. In this paper, we introduce a novel Test-Time Modality Generalization (TTMG) framework, which comprises two core components: Modality-Aware Style Projection (MASP) and Modality-Sensitive Instance Whitening (MSIW), designed to enhance generalization in arbitrary unseen modality datasets. The MASP estimates the likelihood of a test instance belonging to each seen modality and maps it onto a distribution using modality-specific style bases, guiding its projection effectively. Furthermore, as high feature covariance hinders generalization to unseen modalities, the MSIW is applied during training to selectively suppress modality-sensitive information while retaining modality-invariant features. By integrating MASP and MSIW, the TTMG framework demonstrates robust generalization capabilities for medical image segmentation in unseen modalities a challenge that current methods have largely neglected. We evaluated TTMG alongside other domain generalization techniques across eleven datasets spanning four modalities (colonoscopy, ultrasound, dermoscopy, and radiology), consistently achieving superior segmentation performance across various modality combinations.

Test-Time Modality Generalization for Medical Image Segmentation

TL;DR

The paper tackles the problem of medical image segmentation generalization to unseen modalities. It introduces Test-Time Modality Generalization (TTMG), a framework combining Modality-Aware Style Projection (MASP) and Modality-Sensitive Instance Whitening (MSIW) to map unseen modality features to seen modality style distributions and to selectively whiten modality-sensitive covariances. MASP leverages a modality classifier and a prototype-style bases bank to project features, while MSIW computes covariance variance to isolate modality-sensitive channels and applies targeted whitening, all trained with a multi-term loss that preserves content and modality distinctions. Empirical results across 11 datasets spanning colonoscopy, ultrasound, dermoscopy, and radiology show that TTMG achieves superior unseen-modality segmentation without extra training, outperforming traditional DG methods and maintaining efficiency. The work highlights a practical, scalable solution for deploying a single model across diverse clinical settings, potentially reducing data curation and retraining needs in real-world medical imaging tasks.

Abstract

Generalizable medical image segmentation is essential for ensuring consistent performance across diverse unseen clinical settings. However, existing methods often overlook the capability to generalize effectively across arbitrary unseen modalities. In this paper, we introduce a novel Test-Time Modality Generalization (TTMG) framework, which comprises two core components: Modality-Aware Style Projection (MASP) and Modality-Sensitive Instance Whitening (MSIW), designed to enhance generalization in arbitrary unseen modality datasets. The MASP estimates the likelihood of a test instance belonging to each seen modality and maps it onto a distribution using modality-specific style bases, guiding its projection effectively. Furthermore, as high feature covariance hinders generalization to unseen modalities, the MSIW is applied during training to selectively suppress modality-sensitive information while retaining modality-invariant features. By integrating MASP and MSIW, the TTMG framework demonstrates robust generalization capabilities for medical image segmentation in unseen modalities a challenge that current methods have largely neglected. We evaluated TTMG alongside other domain generalization techniques across eleven datasets spanning four modalities (colonoscopy, ultrasound, dermoscopy, and radiology), consistently achieving superior segmentation performance across various modality combinations.

Paper Structure

This paper contains 23 sections, 10 equations, 15 figures, 23 tables, 1 algorithm.

Figures (15)

  • Figure 1: (a) Existing medical image segmentation methods: Performance evaluation on unseen modalities is challenging, often requiring retraining and hyperparameter optimization for each unseen modality dataset, which incurs substantial costs. (b) Schematic diagram of the proposed TTMG framework for unseen modality generalizable medical image segmentation: When an arbitrary unseen modality instance is input, TTMG projects the misaligned, unseen modality features into a well-aligned, seen modality feature distribution using modality-specific style bases, guided by the probability of the instance belonging to each modality without additional training.
  • Figure 2: (a) The overall architecture of the proposed framework, called Test-Time Modality Generalization (TTMG), which mainly comprises MASP (Figure \ref{['fig:TTMG']}.(b)) and MSIW (Algorithm \ref{['alg_MSIW']}). (b) MASP stage ($M = 3$ in this figure). (c) Notation description used in this paper.
  • Figure 3: Qualitative comparison of other methods and TTMG on Radiology and Ultrasound modalities. (a) Input images with ground truth. (b) baseline (DeepLabV3+ with ResNet50, chen2018encoder). (c) IN ulyanov2017improved. (d) IW huang2018decorrelated. (e) IBN pan2018two. (f) RobustNet choi2021robustnet. (g) SAN-SAW peng2022semantic. (h) SPCNet huang2023style. (i) BlindNet ahn2024style. (j) TTMG (Ours). In this figure, Green and Red lines denote the boundaries of the ground truth and prediction, respectively.
  • Figure 4: T-SNE visualization van2008visualizing of features for different modalities (a) before and (b) after test-time style projection from Stage1.
  • Figure 5: Visualization of covariance matrix extracted from baseline and TTMG.
  • ...and 10 more figures