Table of Contents
Fetching ...

Unified and Semantically Grounded Domain Adaptation for Medical Image Segmentation

Xin Wang, Yin Guo, Jiamin Xia, Kaiyu Zhang, Niranjan Balu, Mahmud Mossa-Basha, Linda Shapiro, Chun Yuan

TL;DR

This work introduces a unified, semantically grounded framework that supports both source-accessible and source-free adaptation in medical imaging, and achieves state-of-the-art results in both settings, with source-free performance closely approaching its source-accessible counterpart.

Abstract

Most prior unsupervised domain adaptation approaches for medical image segmentation are narrowly tailored to either the source-accessible setting, where adaptation is guided by source-target alignment, or the source-free setting, which typically resorts to implicit adaptation mechanisms such as pseudo-labeling and network distillation. This substantial divergence in methodological designs between the two settings reveals an inherent flaw: the lack of an explicit, structured construction of anatomical knowledge that naturally generalizes across domains and settings. To bridge this longstanding divide, we introduce a unified, semantically grounded framework that supports both source-accessible and source-free adaptation. Fundamentally distinct from all prior works, our framework's adaptability emerges naturally as a direct consequence of the model architecture, without relying on explicit cross-domain alignment strategies. Specifically, our model learns a domain-agnostic probabilistic manifold as a global space of anatomical regularities, mirroring how humans establish visual understanding. Thus, the structural content in each image can be interpreted as a canonical anatomy retrieved from the manifold and a spatial transformation capturing individual-specific geometry. This disentangled, interpretable formulation enables semantically meaningful prediction with intrinsic adaptability. Extensive experiments on challenging cardiac and abdominal datasets show that our framework achieves state-of-the-art results in both settings, with source-free performance closely approaching its source-accessible counterpart, a level of consistency rarely observed in prior works. The results provide a principled foundation for anatomically informed, interpretable, and unified solutions for domain adaptation in medical imaging. The code is available at https://github.com/wxdrizzle/remind

Unified and Semantically Grounded Domain Adaptation for Medical Image Segmentation

TL;DR

This work introduces a unified, semantically grounded framework that supports both source-accessible and source-free adaptation in medical imaging, and achieves state-of-the-art results in both settings, with source-free performance closely approaching its source-accessible counterpart.

Abstract

Most prior unsupervised domain adaptation approaches for medical image segmentation are narrowly tailored to either the source-accessible setting, where adaptation is guided by source-target alignment, or the source-free setting, which typically resorts to implicit adaptation mechanisms such as pseudo-labeling and network distillation. This substantial divergence in methodological designs between the two settings reveals an inherent flaw: the lack of an explicit, structured construction of anatomical knowledge that naturally generalizes across domains and settings. To bridge this longstanding divide, we introduce a unified, semantically grounded framework that supports both source-accessible and source-free adaptation. Fundamentally distinct from all prior works, our framework's adaptability emerges naturally as a direct consequence of the model architecture, without relying on explicit cross-domain alignment strategies. Specifically, our model learns a domain-agnostic probabilistic manifold as a global space of anatomical regularities, mirroring how humans establish visual understanding. Thus, the structural content in each image can be interpreted as a canonical anatomy retrieved from the manifold and a spatial transformation capturing individual-specific geometry. This disentangled, interpretable formulation enables semantically meaningful prediction with intrinsic adaptability. Extensive experiments on challenging cardiac and abdominal datasets show that our framework achieves state-of-the-art results in both settings, with source-free performance closely approaching its source-accessible counterpart, a level of consistency rarely observed in prior works. The results provide a principled foundation for anatomically informed, interpretable, and unified solutions for domain adaptation in medical imaging. The code is available at https://github.com/wxdrizzle/remind

Paper Structure

This paper contains 48 sections, 13 equations, 12 figures, 10 tables.

Figures (12)

  • Figure 1: Graphical models of the proposed framework. (a) Generative model. (b) Inference model with hierarchical decomposition. Deterministic variables are in double circles, and observed variables are shaded. Dashed arrows denote selecting the subset $\{\mathbf{z}^{{l_j}}\}_{{l_j}\in\Lambda=\{l_1,\ldots,l_J\}}$.
  • Figure 2: Network architecture for the proposed framework. Without loss of generality, the illustration utilizes $L=3$, $\Lambda=\{1,2,3\}$, and $M=4$. The Gaussian (resp. Laplacian) distributions are represented by feature maps whose two halves of channels correspond to the mean and variance (resp. scale), with the latter obtained via a Softplus function. Random samplings are performed during training, and replaced by taking the mathematical expectations during evaluation. The purple boxes correspond to the calculation of loss terms using related outputs.
  • Figure 3: Qualitative comparison of our method and the baselines that achieve best overall performance (VAMCEI/ProtoContra for the source-accessible/source-free settings). Yellow arrows indicate inferior results.
  • Figure 4: Disentanglement of canonical anatomy and geometry by our model. We visualize the templates $\mathbf{z}$ by decoding them into intermediate segmentations $\widehat{\mathbf{y}\circ\boldsymbol{\phi}}$ and reconstructions $\widehat{\mathbf{x}\circ\boldsymbol{\phi}}$ using the segmentation and reconstruction decoders. We also show the corresponding deformations $\boldsymbol{\phi}^{-1}$, as well as the final segmentations $\widehat{\mathbf{y}}$ and reconstructions $\widehat{\mathbf{x}}$ obtained after warping by $\boldsymbol{\phi}^{-1}$.
  • Figure 5: Inter-image traversal on the MS-CMRSeg dataset. Each row denotes the decoded segmentations corresponding to an interpolation $\mathcal{T}_\alpha(\mathbf{w},\mathbf{w}^\prime)$ between the composition weights $\mathbf{w},\mathbf{w}^\prime$ extracted from two images $\mathbf{x},\mathbf{x}^\prime$. "Target" and "Source" indicates the image domains.
  • ...and 7 more figures