Complementary Information Mutual Learning for Multimodality Medical Image Segmentation
Chuyun Shen, Wenhao Li, Haoqing Chen, Xiaoling Wang, Fengping Zhu, Yuxin Li, Xiangfeng Wang, Bo Jin
TL;DR
This work addresses the challenge of inter-modal redundancy in multimodal medical image segmentation by introducing Complementary Information Mutual Learning (CIML). CIML combines inductive-bias-driven task decomposition (assigning primary versus auxiliary modalities to unimodal subtasks) with redundancy filtering that leverages the variational information bottleneck and cross-modal spatial attention to extract complementary information from auxiliary modalities. The framework yields a two-fold benefit: physically reducing dependence between modalities and algorithmically extracting non-redundant information that improves segmentation accuracy, demonstrated on BraTS2020, autoPET, MICCAI HECKTOR 2022, and a ShapeComposition demonstration. The results show CIML outperforms state-of-the-art methods in Dice and HD95, while enabling Grad-CAM-based visualization of cross-modal contributions, enhancing interpretability and clinical trust. The work contributes a principled addition-based approach to multimodal fusion, with practical impact on robust, explainable medical image segmentation.
Abstract
Radiologists must utilize multiple modal images for tumor segmentation and diagnosis due to the limitations of medical imaging and the diversity of tumor signals. This leads to the development of multimodal learning in segmentation. However, the redundancy among modalities creates challenges for existing subtraction-based joint learning methods, such as misjudging the importance of modalities, ignoring specific modal information, and increasing cognitive load. These thorny issues ultimately decrease segmentation accuracy and increase the risk of overfitting. This paper presents the complementary information mutual learning (CIML) framework, which can mathematically model and address the negative impact of inter-modal redundant information. CIML adopts the idea of addition and removes inter-modal redundant information through inductive bias-driven task decomposition and message passing-based redundancy filtering. CIML first decomposes the multimodal segmentation task into multiple subtasks based on expert prior knowledge, minimizing the information dependence between modalities. Furthermore, CIML introduces a scheme in which each modality can extract information from other modalities additively through message passing. To achieve non-redundancy of extracted information, the redundant filtering is transformed into complementary information learning inspired by the variational information bottleneck. The complementary information learning procedure can be efficiently solved by variational inference and cross-modal spatial attention. Numerical results from the verification task and standard benchmarks indicate that CIML efficiently removes redundant information between modalities, outperforming SOTA methods regarding validation accuracy and segmentation effect.
