Table of Contents
Fetching ...

Enhancing Cross-Modal Medical Image Segmentation through Compositionality

Aniek Eijpe, Valentina Corbetta, Kalina Chupetlovska, Regina Beets-Tan, Wilson Silva

TL;DR

This work tackles cross-modal medical image segmentation under substantial domain shift between imaging modalities. It introduces compositionality as an inductive bias to learn content representations via learnable von Mises-Fisher kernels, enabling content-style disentanglement and reducing model complexity. The approach combines cross-modal translation with a compositional representation module to produce interpretable, spatially-discriminative features ($Z_{vMF}$) used for segmentation, achieving improved performance on MM-WHS and CHAOS while lowering computational costs. The findings suggest that compositional content representations can enhance generalization across modalities and offer meaningful insights into the segmentation process, with practical impact for multi-modality clinical workflows.

Abstract

Cross-modal medical image segmentation presents a significant challenge, as different imaging modalities produce images with varying resolutions, contrasts, and appearances of anatomical structures. We introduce compositionality as an inductive bias in a cross-modal segmentation network to improve segmentation performance and interpretability while reducing complexity. The proposed network is an end-to-end cross-modal segmentation framework that enforces compositionality on the learned representations using learnable von Mises-Fisher kernels. These kernels facilitate content-style disentanglement in the learned representations, resulting in compositional content representations that are inherently interpretable and effectively disentangle different anatomical structures. The experimental results demonstrate enhanced segmentation performance and reduced computational costs on multiple medical datasets. Additionally, we demonstrate the interpretability of the learned compositional features. Code and checkpoints will be publicly available at: https://github.com/Trustworthy-AI-UU-NKI/Cross-Modal-Segmentation.

Enhancing Cross-Modal Medical Image Segmentation through Compositionality

TL;DR

This work tackles cross-modal medical image segmentation under substantial domain shift between imaging modalities. It introduces compositionality as an inductive bias to learn content representations via learnable von Mises-Fisher kernels, enabling content-style disentanglement and reducing model complexity. The approach combines cross-modal translation with a compositional representation module to produce interpretable, spatially-discriminative features () used for segmentation, achieving improved performance on MM-WHS and CHAOS while lowering computational costs. The findings suggest that compositional content representations can enhance generalization across modalities and offer meaningful insights into the segmentation process, with practical impact for multi-modality clinical workflows.

Abstract

Cross-modal medical image segmentation presents a significant challenge, as different imaging modalities produce images with varying resolutions, contrasts, and appearances of anatomical structures. We introduce compositionality as an inductive bias in a cross-modal segmentation network to improve segmentation performance and interpretability while reducing complexity. The proposed network is an end-to-end cross-modal segmentation framework that enforces compositionality on the learned representations using learnable von Mises-Fisher kernels. These kernels facilitate content-style disentanglement in the learned representations, resulting in compositional content representations that are inherently interpretable and effectively disentangle different anatomical structures. The experimental results demonstrate enhanced segmentation performance and reduced computational costs on multiple medical datasets. Additionally, we demonstrate the interpretability of the learned compositional features. Code and checkpoints will be publicly available at: https://github.com/Trustworthy-AI-UU-NKI/Cross-Modal-Segmentation.
Paper Structure (15 sections, 6 equations, 3 figures, 2 tables)

This paper contains 15 sections, 6 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Overview of the proposed framework. $X$ and $Y$ denote the source and target domain from which the encoders $E_x$ and $E_y$ extract the deep features into $Z$. From $Z$, the deep features can be translated to either domain with the generators $G_x$ and $G_y$, or compositional representations $Z_{vMF}$ can be obtained via the vMF kernels ($K_{vMF}$). From $Z_{vMF}$, the segmentation model $S$ predicts the final segmentation masks. $D_x$ and $D_y$ denote the domain discriminators.
  • Figure 2: Visual overview of learning a compositional representation $\mathbf{Z_{vMF}}$ from the representation $\mathbf{Z_y}$ containing the deep features of a single target image $y$.
  • Figure 3: Visual results of our proposed method segmenting the , , , with target images, and the liver parenchyma with target T2-SPIR Images, with the 10 different channels of the compositional representation.