Table of Contents
Fetching ...

CCSD: Cross-Modal Compositional Self-Distillation for Robust Brain Tumor Segmentation with Missing Modalities

Dongqing Xie, Yonghuang Wu, Zisheng Ai, Jun Min, Zhencun Jiang, Shaojin Geng, Lei Wang

TL;DR

CCSD addresses brain tumor segmentation from multi-modal MRI under missing modalities by introducing a shared-specific encoder-decoder with two self-distillation strategies. HMSD transfers knowledge across modality hierarchies from full to partial modality sets, while DMCD simulates progressive modality dropout along a criticality-informed path to boost robustness. Evaluations on BraTS benchmarks show state-of-the-art performance and strong stability across diverse missing-modality scenarios, with ablations confirming the contribution of each distillation component. This approach offers a practical, scalable solution for clinical settings where complete multi-modal data are often unavailable, and it can extend to other multi-modal medical imaging tasks with minimal architectural changes.

Abstract

The accurate segmentation of brain tumors from multi-modal MRI is critical for clinical diagnosis and treatment planning. While integrating complementary information from various MRI sequences is a common practice, the frequent absence of one or more modalities in real-world clinical settings poses a significant challenge, severely compromising the performance and generalizability of deep learning-based segmentation models. To address this challenge, we propose a novel Cross-Modal Compositional Self-Distillation (CCSD) framework that can flexibly handle arbitrary combinations of input modalities. CCSD adopts a shared-specific encoder-decoder architecture and incorporates two self-distillation strategies: (i) a hierarchical modality self-distillation mechanism that transfers knowledge across modality hierarchies to reduce semantic discrepancies, and (ii) a progressive modality combination distillation approach that enhances robustness to missing modalities by simulating gradual modality dropout during training. Extensive experiments on public brain tumor segmentation benchmarks demonstrate that CCSD achieves state-of-the-art performance across various missing-modality scenarios, with strong generalization and stability.

CCSD: Cross-Modal Compositional Self-Distillation for Robust Brain Tumor Segmentation with Missing Modalities

TL;DR

CCSD addresses brain tumor segmentation from multi-modal MRI under missing modalities by introducing a shared-specific encoder-decoder with two self-distillation strategies. HMSD transfers knowledge across modality hierarchies from full to partial modality sets, while DMCD simulates progressive modality dropout along a criticality-informed path to boost robustness. Evaluations on BraTS benchmarks show state-of-the-art performance and strong stability across diverse missing-modality scenarios, with ablations confirming the contribution of each distillation component. This approach offers a practical, scalable solution for clinical settings where complete multi-modal data are often unavailable, and it can extend to other multi-modal medical imaging tasks with minimal architectural changes.

Abstract

The accurate segmentation of brain tumors from multi-modal MRI is critical for clinical diagnosis and treatment planning. While integrating complementary information from various MRI sequences is a common practice, the frequent absence of one or more modalities in real-world clinical settings poses a significant challenge, severely compromising the performance and generalizability of deep learning-based segmentation models. To address this challenge, we propose a novel Cross-Modal Compositional Self-Distillation (CCSD) framework that can flexibly handle arbitrary combinations of input modalities. CCSD adopts a shared-specific encoder-decoder architecture and incorporates two self-distillation strategies: (i) a hierarchical modality self-distillation mechanism that transfers knowledge across modality hierarchies to reduce semantic discrepancies, and (ii) a progressive modality combination distillation approach that enhances robustness to missing modalities by simulating gradual modality dropout during training. Extensive experiments on public brain tumor segmentation benchmarks demonstrate that CCSD achieves state-of-the-art performance across various missing-modality scenarios, with strong generalization and stability.

Paper Structure

This paper contains 17 sections, 7 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Left: Traditional methods handle modality missing scenarios but lack interaction mechanisms among different modality combinations, resulting in significant performance drops when fewer modalities are available. Right: We propose a cross-modal combination self-distillation approach that introduces two strategies to enable knowledge transfer across hierarchical modality combinations, and simulates modality missing during training to improve robustness and maintain performance under partial modality inputs.
  • Figure 2: Overview. Left: We employs two encoders to capture both shared and specific features for each modality. For all possible modality combinations, these two types of features are concatenated along both feature and channel dimensions, followed by a lightweight convolutional layer to obtain the fused features for the corresponding modality combination. Based on the fused features, we implement Hierarchical Modality Self-Distillation and Decremental Modality Combination Distillation. The two distillation losses, together with the segmentation loss computed from the decoder using the fused features, are jointly used to optimize the model parameters. During inference, the model can adapt to various scenarios involving missing modalities. Right: Algorithmic illustration of DMCD. $\mathcal{S}_N^n$: under ideal conditions with $N$ modalities, the set of all cases containing only $n$ modalities. $\mathcal{S}_i$: a specific modality combination, $\tau$: temperature.
  • Figure 3: Visualization. Green: edema; yellow: enhancing tumor; and red: necrotic and non-enhancing tumor core. GT: Ground Truth.
  • Figure 4: Comparison of average Dice scores across varying numbers of modalities.
  • Figure 5: Comparison of AURC scores among our method and four baseline methods on the three segmentation tasks: ET, TC, and WT. AURC evaluates the overall performance and robustness of a model under varying degrees of modality missing by computing the area under the curve of average Dice scores with respect to the number of available modalities. A higher AURC indicates better stability and performance, especially when modalities are missing.