Table of Contents
Fetching ...

Dealing with All-stage Missing Modality: Towards A Universal Model with Robust Reconstruction and Personalization

Yunpeng Zhao, Cheng Chen, Qing You Pang, Quanzheng Li, Carol Tang, Beng-Ti Ang, Yueming Jin

TL;DR

The paper tackles the challenge of all-stage missing modalities in multimodal brain MRI segmentation by proposing a universal framework that jointly learns robust modality reconstruction and per-sample personalization. It introduces a distribution-approximation strategy for training a multimodal masked autoencoder, followed by a data-model co-distillation pipeline that leverages reconstructed full-modality information to guide learning under incomplete inputs. A CLIP-driven hyper-network personalizes a subset of model parameters to accommodate diverse missing-modality scenarios, addressing distribution heterogeneity. Evaluations on BraTS2018 and BraTS2020 demonstrate consistent gains over state-of-the-art methods across all training and testing missing-modality settings, highlighting the practical impact of robust reconstruction and adaptive personalization in real-world clinical data. The approach promises to enable accurate brain tumor segmentation even with highly variable modality availability, with code to be released.

Abstract

Addressing missing modalities presents a critical challenge in multimodal learning. Current approaches focus on developing models that can handle modality-incomplete inputs during inference, assuming that the full set of modalities are available for all the data during training. This reliance on full-modality data for training limits the use of abundant modality-incomplete samples that are often encountered in practical settings. In this paper, we propose a robust universal model with modality reconstruction and model personalization, which can effectively tackle the missing modality at both training and testing stages. Our method leverages a multimodal masked autoencoder to reconstruct the missing modality and masked patches simultaneously, incorporating an innovative distribution approximation mechanism to fully utilize both modality-complete and modality-incomplete data. The reconstructed modalities then contributes to our designed data-model co-distillation scheme to guide the model learning in the presence of missing modalities. Moreover, we propose a CLIP-driven hyper-network to personalize partial model parameters, enabling the model to adapt to each distinct missing modality scenario. Our method has been extensively validated on two brain tumor segmentation benchmarks. Experimental results demonstrate the promising performance of our method, which consistently exceeds previous state-of-the-art approaches under the all-stage missing modality settings with different missing ratios. Code will be available.

Dealing with All-stage Missing Modality: Towards A Universal Model with Robust Reconstruction and Personalization

TL;DR

The paper tackles the challenge of all-stage missing modalities in multimodal brain MRI segmentation by proposing a universal framework that jointly learns robust modality reconstruction and per-sample personalization. It introduces a distribution-approximation strategy for training a multimodal masked autoencoder, followed by a data-model co-distillation pipeline that leverages reconstructed full-modality information to guide learning under incomplete inputs. A CLIP-driven hyper-network personalizes a subset of model parameters to accommodate diverse missing-modality scenarios, addressing distribution heterogeneity. Evaluations on BraTS2018 and BraTS2020 demonstrate consistent gains over state-of-the-art methods across all training and testing missing-modality settings, highlighting the practical impact of robust reconstruction and adaptive personalization in real-world clinical data. The approach promises to enable accurate brain tumor segmentation even with highly variable modality availability, with code to be released.

Abstract

Addressing missing modalities presents a critical challenge in multimodal learning. Current approaches focus on developing models that can handle modality-incomplete inputs during inference, assuming that the full set of modalities are available for all the data during training. This reliance on full-modality data for training limits the use of abundant modality-incomplete samples that are often encountered in practical settings. In this paper, we propose a robust universal model with modality reconstruction and model personalization, which can effectively tackle the missing modality at both training and testing stages. Our method leverages a multimodal masked autoencoder to reconstruct the missing modality and masked patches simultaneously, incorporating an innovative distribution approximation mechanism to fully utilize both modality-complete and modality-incomplete data. The reconstructed modalities then contributes to our designed data-model co-distillation scheme to guide the model learning in the presence of missing modalities. Moreover, we propose a CLIP-driven hyper-network to personalize partial model parameters, enabling the model to adapt to each distinct missing modality scenario. Our method has been extensively validated on two brain tumor segmentation benchmarks. Experimental results demonstrate the promising performance of our method, which consistently exceeds previous state-of-the-art approaches under the all-stage missing modality settings with different missing ratios. Code will be available.
Paper Structure (13 sections, 7 equations, 7 figures, 4 tables)

This paper contains 13 sections, 7 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Our all-stage missing modality setting aims to address modality-incomplete input issues during both the training and inference phases simultaneously Moreover, we can also flexibly handle any ratio of full modality data within the train set.
  • Figure 2: Overview of our proposed universal model for dealing with all-stage missing modality. We first propose a distribution approximation mechanism for a robust modality reconstruction (blue). The data-model co-distillation scheme is then designed to use reconstructed full modality to guide the model learning, in which CLIP-driven hyper-network is proposed to tackle distribution heterogeneity (Green).
  • Figure 3: Comparison with other missing modality methods trained with 100% FM data on BraTS2018. Average DSC across 15 modality combinations are reported.
  • Figure 4: Visualization of reconstructed training modalities. Row 1 indicates that ground truth (GT) is available during pre-training, while row 2 means GT is absent. Column 1-3 represents vanilla multimodal MAE, ours, and GT.
  • Figure 5: Results for varying full modality ratio in training on BraTS2018.
  • ...and 2 more figures