Table of Contents
Fetching ...

Mamba Goes HoME: Hierarchical Soft Mixture-of-Experts for 3D Medical Image Segmentation

Szymon Płotka, Gizem Mert, Maciej Chrabaszcz, Ewa Szczurek, Arkadiusz Sitek

TL;DR

The paper addresses efficient 3D medical image segmentation across multiple modalities by introducing HoME, a two-level hierarchical Soft Mixture-of-Experts routing layer built atop the memory-efficient Mamba backbone. HoME partitions tokens into local groups for intra-group expert routing and then fuses these with a global second-level MoE to capture cross-group context, achieving scalable long-context modeling with complexity $O(B N_i d)$ for Mamba and $O(B G_i (E_i + E_{2,i}) L_i d)$ for HoME. The proposed Mamba-HoME architecture demonstrates state-of-the-art segmentation performance across CT, MRI, and US datasets, with strong generalization and favorable memory characteristics, especially when pretrained in a supervised manner. This work advances practical, scalable 3D segmentation by combining hierarchical local processing with global context refinement, offering benefits for multi-modal clinical workflows and potentially informing broader hierarchical data modeling approaches.

Abstract

In recent years, artificial intelligence has significantly advanced medical image segmentation. Nonetheless, challenges remain, including efficient 3D medical image processing across diverse modalities and handling data variability. In this work, we introduce Hierarchical Soft Mixture-of-Experts (HoME), a two-level token-routing layer for efficient long-context modeling, specifically designed for 3D medical image segmentation. Built on the Mamba Selective State Space Model (SSM) backbone, HoME enhances sequential modeling through adaptive expert routing. In the first level, a Soft Mixture-of-Experts (SMoE) layer partitions input sequences into local groups, routing tokens to specialized per-group experts for localized feature extraction. The second level aggregates these outputs through a global SMoE layer, enabling cross-group information fusion and global context refinement. This hierarchical design, combining local expert routing with global expert refinement, enhances generalizability and segmentation performance, surpassing state-of-the-art results across datasets from the three most widely used 3D medical imaging modalities and varying data qualities. The code is publicly available at https://github.com/gmum/MambaHoME.

Mamba Goes HoME: Hierarchical Soft Mixture-of-Experts for 3D Medical Image Segmentation

TL;DR

The paper addresses efficient 3D medical image segmentation across multiple modalities by introducing HoME, a two-level hierarchical Soft Mixture-of-Experts routing layer built atop the memory-efficient Mamba backbone. HoME partitions tokens into local groups for intra-group expert routing and then fuses these with a global second-level MoE to capture cross-group context, achieving scalable long-context modeling with complexity for Mamba and for HoME. The proposed Mamba-HoME architecture demonstrates state-of-the-art segmentation performance across CT, MRI, and US datasets, with strong generalization and favorable memory characteristics, especially when pretrained in a supervised manner. This work advances practical, scalable 3D segmentation by combining hierarchical local processing with global context refinement, offering benefits for multi-modal clinical workflows and potentially informing broader hierarchical data modeling approaches.

Abstract

In recent years, artificial intelligence has significantly advanced medical image segmentation. Nonetheless, challenges remain, including efficient 3D medical image processing across diverse modalities and handling data variability. In this work, we introduce Hierarchical Soft Mixture-of-Experts (HoME), a two-level token-routing layer for efficient long-context modeling, specifically designed for 3D medical image segmentation. Built on the Mamba Selective State Space Model (SSM) backbone, HoME enhances sequential modeling through adaptive expert routing. In the first level, a Soft Mixture-of-Experts (SMoE) layer partitions input sequences into local groups, routing tokens to specialized per-group experts for localized feature extraction. The second level aggregates these outputs through a global SMoE layer, enabling cross-group information fusion and global context refinement. This hierarchical design, combining local expert routing with global expert refinement, enhances generalizability and segmentation performance, surpassing state-of-the-art results across datasets from the three most widely used 3D medical imaging modalities and varying data qualities. The code is publicly available at https://github.com/gmum/MambaHoME.

Paper Structure

This paper contains 30 sections, 26 equations, 12 figures, 26 tables.

Figures (12)

  • Figure 1: An overview of the HoME layer and Mamba-HoME Block design. (a) The HoME layer operates on $G_I$ groups of $K_I$ tokens. $\mathrm{Router}_1$ routes each group to $E$ local experts for intra-group feature extraction, producing aggregated slot representations. $\mathrm{Router}_2$ routes these aggregated slots to $2E$ experts for inter-group communication and global refinement. (b) The Mamba-HoME Block combines a Gated Spatial Convolution (GSC) module, Mamba for efficient long-sequence processing, and hierarchical expert processing (HoME). Dynamic Tanh is used for normalization to improve gradient stability and efficiency.
  • Figure 2: Qualitative segmentation results from top to bottom: CT, MRI, and 3D US. From left to right, each column shows the input slice, ground truth, the proposed Mamba-HoME, and the five next best-performing methods.
  • Figure 3: Qualitative segmentation results from top to bottom: CT, MRI, and 3D US. From left to right, each column shows the input slice, ground truth, our proposed pre-trained Mamba-HoME, Mamba-HoME trained from scratch, and the baseline SegMamba.
  • Figure 4: Qualitative comparison of Mamba-HoME and SegMamba on abdominal CT scans from PANORAMA test set. The images highlight the impact of the HoME layer added to the baseline SegMamba model, with green (PDAC) and red (pancreas) annotations indicating segmentation differences. Mamba-HoME demonstrates robustness and improved accuracy in detecting both small and large anatomical structures, like tumors, compared to SegMamba alone. Please note that we show Mamba-HoME results trained from scratch.
  • Figure 5: Qualitative segmentation results for PDAC (green) and the pancreas (red) provided by Mamba-HoME and the next three top-performing methods. The first three rows display cases from the MSD Pancreas dataset, while the last two rows show cases from the in-house dataset. Please note, we show Mamba-HoME results trained from scratch.
  • ...and 7 more figures