Table of Contents
Fetching ...

Complementary Information Mutual Learning for Multimodality Medical Image Segmentation

Chuyun Shen, Wenhao Li, Haoqing Chen, Xiaoling Wang, Fengping Zhu, Yuxin Li, Xiangfeng Wang, Bo Jin

TL;DR

This work addresses the challenge of inter-modal redundancy in multimodal medical image segmentation by introducing Complementary Information Mutual Learning (CIML). CIML combines inductive-bias-driven task decomposition (assigning primary versus auxiliary modalities to unimodal subtasks) with redundancy filtering that leverages the variational information bottleneck and cross-modal spatial attention to extract complementary information from auxiliary modalities. The framework yields a two-fold benefit: physically reducing dependence between modalities and algorithmically extracting non-redundant information that improves segmentation accuracy, demonstrated on BraTS2020, autoPET, MICCAI HECKTOR 2022, and a ShapeComposition demonstration. The results show CIML outperforms state-of-the-art methods in Dice and HD95, while enabling Grad-CAM-based visualization of cross-modal contributions, enhancing interpretability and clinical trust. The work contributes a principled addition-based approach to multimodal fusion, with practical impact on robust, explainable medical image segmentation.

Abstract

Radiologists must utilize multiple modal images for tumor segmentation and diagnosis due to the limitations of medical imaging and the diversity of tumor signals. This leads to the development of multimodal learning in segmentation. However, the redundancy among modalities creates challenges for existing subtraction-based joint learning methods, such as misjudging the importance of modalities, ignoring specific modal information, and increasing cognitive load. These thorny issues ultimately decrease segmentation accuracy and increase the risk of overfitting. This paper presents the complementary information mutual learning (CIML) framework, which can mathematically model and address the negative impact of inter-modal redundant information. CIML adopts the idea of addition and removes inter-modal redundant information through inductive bias-driven task decomposition and message passing-based redundancy filtering. CIML first decomposes the multimodal segmentation task into multiple subtasks based on expert prior knowledge, minimizing the information dependence between modalities. Furthermore, CIML introduces a scheme in which each modality can extract information from other modalities additively through message passing. To achieve non-redundancy of extracted information, the redundant filtering is transformed into complementary information learning inspired by the variational information bottleneck. The complementary information learning procedure can be efficiently solved by variational inference and cross-modal spatial attention. Numerical results from the verification task and standard benchmarks indicate that CIML efficiently removes redundant information between modalities, outperforming SOTA methods regarding validation accuracy and segmentation effect.

Complementary Information Mutual Learning for Multimodality Medical Image Segmentation

TL;DR

This work addresses the challenge of inter-modal redundancy in multimodal medical image segmentation by introducing Complementary Information Mutual Learning (CIML). CIML combines inductive-bias-driven task decomposition (assigning primary versus auxiliary modalities to unimodal subtasks) with redundancy filtering that leverages the variational information bottleneck and cross-modal spatial attention to extract complementary information from auxiliary modalities. The framework yields a two-fold benefit: physically reducing dependence between modalities and algorithmically extracting non-redundant information that improves segmentation accuracy, demonstrated on BraTS2020, autoPET, MICCAI HECKTOR 2022, and a ShapeComposition demonstration. The results show CIML outperforms state-of-the-art methods in Dice and HD95, while enabling Grad-CAM-based visualization of cross-modal contributions, enhancing interpretability and clinical trust. The work contributes a principled addition-based approach to multimodal fusion, with practical impact on robust, explainable medical image segmentation.

Abstract

Radiologists must utilize multiple modal images for tumor segmentation and diagnosis due to the limitations of medical imaging and the diversity of tumor signals. This leads to the development of multimodal learning in segmentation. However, the redundancy among modalities creates challenges for existing subtraction-based joint learning methods, such as misjudging the importance of modalities, ignoring specific modal information, and increasing cognitive load. These thorny issues ultimately decrease segmentation accuracy and increase the risk of overfitting. This paper presents the complementary information mutual learning (CIML) framework, which can mathematically model and address the negative impact of inter-modal redundant information. CIML adopts the idea of addition and removes inter-modal redundant information through inductive bias-driven task decomposition and message passing-based redundancy filtering. CIML first decomposes the multimodal segmentation task into multiple subtasks based on expert prior knowledge, minimizing the information dependence between modalities. Furthermore, CIML introduces a scheme in which each modality can extract information from other modalities additively through message passing. To achieve non-redundancy of extracted information, the redundant filtering is transformed into complementary information learning inspired by the variational information bottleneck. The complementary information learning procedure can be efficiently solved by variational inference and cross-modal spatial attention. Numerical results from the verification task and standard benchmarks indicate that CIML efficiently removes redundant information between modalities, outperforming SOTA methods regarding validation accuracy and segmentation effect.
Paper Structure (38 sections, 49 equations, 15 figures, 6 tables)

This paper contains 38 sections, 49 equations, 15 figures, 6 tables.

Figures (15)

  • Figure 1: The Diagram of Addition and Subtract operations. The dark gray circle with black slashes and the dark gray circle without black slashes represent the same information in embeddings. The Subtract operation first concatenates the information from different modalities and eliminates redundancy. The Addition operation first eliminates cross-modal redundant information and then concatenates the embeddings.
  • Figure 2: Dataset annotation for BraTS2020. Displayed are image patches with tumor structures annotated in various modalities (bottom left) and the final labels for the entire dataset (right). Image patches show from left to right: (a) the FLAIR image and the whole tumor (WT) visible in FLAIR; (b) the T2 image and the tumor core (TC) in T2; (c) the T1CE image, and the enhancing tumor (ET) visible in T1CE (yellow), surrounding the necrotic and non-enhancing tumor core(red); (d) Final labels of the tumor structures: edema (green), ET (yellow), necrotic and non-enhancing tumor core (red). In the BraTS2020challenge, images are requested to segment into WT, TC, and ET regions.
  • Figure 3: Illustration of complementary information mutual learning (CIML) framework for BraTS2020 challenge and autoPET challenge. The input to each segmentor consists of multimodal images that are specific to each modality. After processing, the segmentors send a portion of the embeddings as messages to other segmentors to assist with other sub-tasks and accept messages from other segmentors to extract efficient information. The dark blue lines with bi-directional arrows in the figures represent the message passing. Finally, the segmentors complete their sub-tasks. The MICCAI HECKTOR 2022 challenge also applies a similar framework to the autoPET challenge.
  • Figure 4: Segmentor contains three parts: message generator, complementary information filter, and predictor.
  • Figure 5: Schematic of the network architecture of the segmentor. Generator $G_{FLAIR}$ is employed to extract features from FLAIR images individually. Complementary Information Filter (CIF) Module is used to extract complementary information from messages, and predictor $P_{FLAIR}$ is utilized to generate the final segmentation.
  • ...and 10 more figures