Table of Contents
Fetching ...

MSM-Seg: A Modality-and-Slice Memory Framework with Category-Agnostic Prompting for Multi-Modal Brain Tumor Segmentation

Yuxiang Luo, Qing Xu, Hai Huang, Yuqi Ouyang, Zhen Chen, Wenting Duan

TL;DR

MSM-Seg tackles the challenge of multi-modal brain tumor segmentation by introducing a dual-memory framework that explicitly models cross-modal and inter-slice dependencies. The core contributions are the modality-and-slice memory attention (MSMA), a category-agnostic multi-scale prompt encoder (MCP-Encoder), and a modality-adaptive fusion decoder (MF-Decoder), which together enable robust, guided decoding without requiring subregion-specific prompts. Extensive experiments on BraTS-METS and BraTS-AGPT demonstrate superior Dice scores and improved boundary precision compared with both classical and prompt-based baselines, validating the effectiveness of memory-driven cross-modal integration. The framework promises practical impact by reducing annotation burden and improving segmentation reliability in diverse tumor presentations across clinical settings.

Abstract

Multi-modal brain tumor segmentation is critical for clinical diagnosis, and it requires accurate identification of distinct internal anatomical subregions. While the recent prompt-based segmentation paradigms enable interactive experiences for clinicians, existing methods ignore cross-modal correlations and rely on labor-intensive category-specific prompts, limiting their applicability in real-world scenarios. To address these issues, we propose a MSM-Seg framework for multi-modal brain tumor segmentation. The MSM-Seg introduces a novel dual-memory segmentation paradigm that synergistically integrates multi-modal and inter-slice information with the efficient category-agnostic prompt for brain tumor understanding. To this end, we first devise a modality-and-slice memory attention (MSMA) to exploit the cross-modal and inter-slice relationships among the input scans. Then, we propose a multi-scale category-agnostic prompt encoder (MCP-Encoder) to provide tumor region guidance for decoding. Moreover, we devise a modality-adaptive fusion decoder (MF-Decoder) that leverages the complementary decoding information across different modalities to improve segmentation accuracy. Extensive experiments on different MRI datasets demonstrate that our MSM-Seg framework outperforms state-of-the-art methods in multi-modal metastases and glioma tumor segmentation. The code is available at https://github.com/xq141839/MSM-Seg.

MSM-Seg: A Modality-and-Slice Memory Framework with Category-Agnostic Prompting for Multi-Modal Brain Tumor Segmentation

TL;DR

MSM-Seg tackles the challenge of multi-modal brain tumor segmentation by introducing a dual-memory framework that explicitly models cross-modal and inter-slice dependencies. The core contributions are the modality-and-slice memory attention (MSMA), a category-agnostic multi-scale prompt encoder (MCP-Encoder), and a modality-adaptive fusion decoder (MF-Decoder), which together enable robust, guided decoding without requiring subregion-specific prompts. Extensive experiments on BraTS-METS and BraTS-AGPT demonstrate superior Dice scores and improved boundary precision compared with both classical and prompt-based baselines, validating the effectiveness of memory-driven cross-modal integration. The framework promises practical impact by reducing annotation burden and improving segmentation reliability in diverse tumor presentations across clinical settings.

Abstract

Multi-modal brain tumor segmentation is critical for clinical diagnosis, and it requires accurate identification of distinct internal anatomical subregions. While the recent prompt-based segmentation paradigms enable interactive experiences for clinicians, existing methods ignore cross-modal correlations and rely on labor-intensive category-specific prompts, limiting their applicability in real-world scenarios. To address these issues, we propose a MSM-Seg framework for multi-modal brain tumor segmentation. The MSM-Seg introduces a novel dual-memory segmentation paradigm that synergistically integrates multi-modal and inter-slice information with the efficient category-agnostic prompt for brain tumor understanding. To this end, we first devise a modality-and-slice memory attention (MSMA) to exploit the cross-modal and inter-slice relationships among the input scans. Then, we propose a multi-scale category-agnostic prompt encoder (MCP-Encoder) to provide tumor region guidance for decoding. Moreover, we devise a modality-adaptive fusion decoder (MF-Decoder) that leverages the complementary decoding information across different modalities to improve segmentation accuracy. Extensive experiments on different MRI datasets demonstrate that our MSM-Seg framework outperforms state-of-the-art methods in multi-modal metastases and glioma tumor segmentation. The code is available at https://github.com/xq141839/MSM-Seg.

Paper Structure

This paper contains 23 sections, 10 equations, 3 figures, 6 tables, 1 algorithm.

Figures (3)

  • Figure 1: Comparison of our MSM-Seg and existing multi-modal brain tumor segmentation paradigms. (a) Classical multi-modal segmentation networks. (b) Recent SAM2 segmentation paradigm relies on labor-intensive, category-specific prompts. (c) Our MSM-Seg efficiently leverages modality and slice memory with category-agnostic prompting for multi-modal brain tumor segmentation.
  • Figure 2: (a) Overview of the MSM-Seg framework for multi-modal brain tumor segmentation. At each step, MSM-Seg retrieves modality and slice memory, with (b) MSMA to create memory-enhanced embeddings, (c) MCP-Encoder to generate tumor region guidance using category-agnostic prompts, and (d) MF-Decoder to produce brain tumor segmentation masks. MSM-Seg effectively exploits the complementary contextual understanding conditioned on the preceding slice and modality memory.
  • Figure 3: Visualization of multi-modal brain tumor segmentation on the BraTS-METS and BraTS-AGPT datasets. Our MSM-Seg exhibits the best results, accurately delineating hierarchical tumor subregions (ET, NETC, and SNFH) with precise boundaries while having fewer false positives and better preservation of tumor morphology compared to state-of-the-art methods.