Table of Contents
Fetching ...

Robust Semi-supervised Multimodal Medical Image Segmentation via Cross Modality Collaboration

Xiaogen Zhou, Yiyou Sun, Min Deng, Winnie Chiu Wing Chu, Qi Dou

TL;DR

This framework employs a novel cross modality collaboration strategy to distill modality-independent knowledge, which is inherently associated with each modality, and integrates this information into a unified fusion layer for feature amalgamation, and ensures alignment of modality-independent information from a feature-wise perspective across modalities.

Abstract

Multimodal learning leverages complementary information derived from different modalities, thereby enhancing performance in medical image segmentation. However, prevailing multimodal learning methods heavily rely on extensive well-annotated data from various modalities to achieve accurate segmentation performance. This dependence often poses a challenge in clinical settings due to limited availability of such data. Moreover, the inherent anatomical misalignment between different imaging modalities further complicates the endeavor to enhance segmentation performance. To address this problem, we propose a novel semi-supervised multimodal segmentation framework that is robust to scarce labeled data and misaligned modalities. Our framework employs a novel cross modality collaboration strategy to distill modality-independent knowledge, which is inherently associated with each modality, and integrates this information into a unified fusion layer for feature amalgamation. With a channel-wise semantic consistency loss, our framework ensures alignment of modality-independent information from a feature-wise perspective across modalities, thereby fortifying it against misalignments in multimodal scenarios. Furthermore, our framework effectively integrates contrastive consistent learning to regulate anatomical structures, facilitating anatomical-wise prediction alignment on unlabeled data in semi-supervised segmentation tasks. Our method achieves competitive performance compared to other multimodal methods across three tasks: cardiac, abdominal multi-organ, and thyroid-associated orbitopathy segmentations. It also demonstrates outstanding robustness in scenarios involving scarce labeled data and misaligned modalities.

Robust Semi-supervised Multimodal Medical Image Segmentation via Cross Modality Collaboration

TL;DR

This framework employs a novel cross modality collaboration strategy to distill modality-independent knowledge, which is inherently associated with each modality, and integrates this information into a unified fusion layer for feature amalgamation, and ensures alignment of modality-independent information from a feature-wise perspective across modalities.

Abstract

Multimodal learning leverages complementary information derived from different modalities, thereby enhancing performance in medical image segmentation. However, prevailing multimodal learning methods heavily rely on extensive well-annotated data from various modalities to achieve accurate segmentation performance. This dependence often poses a challenge in clinical settings due to limited availability of such data. Moreover, the inherent anatomical misalignment between different imaging modalities further complicates the endeavor to enhance segmentation performance. To address this problem, we propose a novel semi-supervised multimodal segmentation framework that is robust to scarce labeled data and misaligned modalities. Our framework employs a novel cross modality collaboration strategy to distill modality-independent knowledge, which is inherently associated with each modality, and integrates this information into a unified fusion layer for feature amalgamation. With a channel-wise semantic consistency loss, our framework ensures alignment of modality-independent information from a feature-wise perspective across modalities, thereby fortifying it against misalignments in multimodal scenarios. Furthermore, our framework effectively integrates contrastive consistent learning to regulate anatomical structures, facilitating anatomical-wise prediction alignment on unlabeled data in semi-supervised segmentation tasks. Our method achieves competitive performance compared to other multimodal methods across three tasks: cardiac, abdominal multi-organ, and thyroid-associated orbitopathy segmentations. It also demonstrates outstanding robustness in scenarios involving scarce labeled data and misaligned modalities.
Paper Structure (9 sections, 3 equations, 3 figures, 2 tables)

This paper contains 9 sections, 3 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: The pipeline of our framework. It consists of three main components. (1) Two 3D foundation model-driven encoders are employed to capture 3D volumetric representations from different modalities. (2) A novel cross-modality collaborative strategy is introduced to fuse information and align feature maps. (3) A contrastive consistent learning module is exploited to generate consistent predictions for unlabeled data.
  • Figure 2: Comparison of segmentation performance across different models on the MS-CMRSeg b6_1 dataset using various labeled data ratios, evaluated in terms of Dice score.
  • Figure 3: (a) and (b) Visual comparison between the CML b7_2 and our method on the TAO and MS-CMRSeg b6_1 datasets. (c) Ablation analysis of the cross CMC strategy. (d) Ablation study of single and multiple modalities for training our model on the MS-CMRSeg dataset b6_1.