Table of Contents
Fetching ...

A Foundation Model for Brain Lesion Segmentation with Mixture of Modality Experts

Xinru Zhang, Ni Ou, Berke Doga Basaran, Marco Visentin, Mengyun Qiao, Renyang Gu, Cheng Ouyang, Yaou Liu, Paul M. Matthew, Chuyang Ye, Wenjia Bai

TL;DR

This work tackles the difficulty of brain lesion segmentation across diverse MRI modalities by introducing MoME, a foundation-model-style framework that couples modality-specific experts with a hierarchical gating network to adaptively fuse predictions. A curriculum learning strategy is employed to prevent expert degeneration and gradually shift from modality specialization to collaborative inference, enabling robust cross-modality generalization. Evaluated on nine datasets spanning five modalities and eight lesion types (6,585 annotated 3D images), MoME outperforms competing foundation models and shows strong generalization to unseen data while remaining more memory-efficient than training multiple task-specific nnU-Nets. The approach demonstrates practical potential for deploying a single, versatile segmentation system in real-world clinical settings and is accompanied by public code.

Abstract

Brain lesion segmentation plays an essential role in neurological research and diagnosis. As brain lesions can be caused by various pathological alterations, different types of brain lesions tend to manifest with different characteristics on different imaging modalities. Due to this complexity, brain lesion segmentation methods are often developed in a task-specific manner. A specific segmentation model is developed for a particular lesion type and imaging modality. However, the use of task-specific models requires predetermination of the lesion type and imaging modality, which complicates their deployment in real-world scenarios. In this work, we propose a universal foundation model for 3D brain lesion segmentation, which can automatically segment different types of brain lesions for input data of various imaging modalities. We formulate a novel Mixture of Modality Experts (MoME) framework with multiple expert networks attending to different imaging modalities. A hierarchical gating network combines the expert predictions and fosters expertise collaboration. Furthermore, we introduce a curriculum learning strategy during training to avoid the degeneration of each expert network and preserve their specialization. We evaluated the proposed method on nine brain lesion datasets, encompassing five imaging modalities and eight lesion types. The results show that our model outperforms state-of-the-art universal models and provides promising generalization to unseen datasets.

A Foundation Model for Brain Lesion Segmentation with Mixture of Modality Experts

TL;DR

This work tackles the difficulty of brain lesion segmentation across diverse MRI modalities by introducing MoME, a foundation-model-style framework that couples modality-specific experts with a hierarchical gating network to adaptively fuse predictions. A curriculum learning strategy is employed to prevent expert degeneration and gradually shift from modality specialization to collaborative inference, enabling robust cross-modality generalization. Evaluated on nine datasets spanning five modalities and eight lesion types (6,585 annotated 3D images), MoME outperforms competing foundation models and shows strong generalization to unseen data while remaining more memory-efficient than training multiple task-specific nnU-Nets. The approach demonstrates practical potential for deploying a single, versatile segmentation system in real-world clinical settings and is accompanied by public code.

Abstract

Brain lesion segmentation plays an essential role in neurological research and diagnosis. As brain lesions can be caused by various pathological alterations, different types of brain lesions tend to manifest with different characteristics on different imaging modalities. Due to this complexity, brain lesion segmentation methods are often developed in a task-specific manner. A specific segmentation model is developed for a particular lesion type and imaging modality. However, the use of task-specific models requires predetermination of the lesion type and imaging modality, which complicates their deployment in real-world scenarios. In this work, we propose a universal foundation model for 3D brain lesion segmentation, which can automatically segment different types of brain lesions for input data of various imaging modalities. We formulate a novel Mixture of Modality Experts (MoME) framework with multiple expert networks attending to different imaging modalities. A hierarchical gating network combines the expert predictions and fosters expertise collaboration. Furthermore, we introduce a curriculum learning strategy during training to avoid the degeneration of each expert network and preserve their specialization. We evaluated the proposed method on nine brain lesion datasets, encompassing five imaging modalities and eight lesion types. The results show that our model outperforms state-of-the-art universal models and provides promising generalization to unseen datasets.
Paper Structure (18 sections, 4 equations, 2 figures, 3 tables)

This paper contains 18 sections, 4 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Different paradigms for brain lesion segmentation: a) the traditional paradigm that trains multiple task-specific models; b) the foundation model paradigm that trains a single universal model for multiple tasks; c) the proposed mixture of modality experts ($\text{MoME}$) framework for constructing the foundation model.
  • Figure 2: More detailed analysis of the MoME result on seen datasets. a) A radar chart that compares the average Dice score of foundation models from the perspectives of different modalities and lesion types. b) t-SNE plots of latent spaces for nnU-Net and $\text{MoME}$, where each dot represents a brain image.