A Foundation Model for Brain Lesion Segmentation with Mixture of Modality Experts
Xinru Zhang, Ni Ou, Berke Doga Basaran, Marco Visentin, Mengyun Qiao, Renyang Gu, Cheng Ouyang, Yaou Liu, Paul M. Matthew, Chuyang Ye, Wenjia Bai
TL;DR
This work tackles the difficulty of brain lesion segmentation across diverse MRI modalities by introducing MoME, a foundation-model-style framework that couples modality-specific experts with a hierarchical gating network to adaptively fuse predictions. A curriculum learning strategy is employed to prevent expert degeneration and gradually shift from modality specialization to collaborative inference, enabling robust cross-modality generalization. Evaluated on nine datasets spanning five modalities and eight lesion types (6,585 annotated 3D images), MoME outperforms competing foundation models and shows strong generalization to unseen data while remaining more memory-efficient than training multiple task-specific nnU-Nets. The approach demonstrates practical potential for deploying a single, versatile segmentation system in real-world clinical settings and is accompanied by public code.
Abstract
Brain lesion segmentation plays an essential role in neurological research and diagnosis. As brain lesions can be caused by various pathological alterations, different types of brain lesions tend to manifest with different characteristics on different imaging modalities. Due to this complexity, brain lesion segmentation methods are often developed in a task-specific manner. A specific segmentation model is developed for a particular lesion type and imaging modality. However, the use of task-specific models requires predetermination of the lesion type and imaging modality, which complicates their deployment in real-world scenarios. In this work, we propose a universal foundation model for 3D brain lesion segmentation, which can automatically segment different types of brain lesions for input data of various imaging modalities. We formulate a novel Mixture of Modality Experts (MoME) framework with multiple expert networks attending to different imaging modalities. A hierarchical gating network combines the expert predictions and fosters expertise collaboration. Furthermore, we introduce a curriculum learning strategy during training to avoid the degeneration of each expert network and preserve their specialization. We evaluated the proposed method on nine brain lesion datasets, encompassing five imaging modalities and eight lesion types. The results show that our model outperforms state-of-the-art universal models and provides promising generalization to unseen datasets.
