Mamba Goes HoME: Hierarchical Soft Mixture-of-Experts for 3D Medical Image Segmentation
Szymon Płotka, Gizem Mert, Maciej Chrabaszcz, Ewa Szczurek, Arkadiusz Sitek
TL;DR
The paper addresses efficient 3D medical image segmentation across multiple modalities by introducing HoME, a two-level hierarchical Soft Mixture-of-Experts routing layer built atop the memory-efficient Mamba backbone. HoME partitions tokens into local groups for intra-group expert routing and then fuses these with a global second-level MoE to capture cross-group context, achieving scalable long-context modeling with complexity $O(B N_i d)$ for Mamba and $O(B G_i (E_i + E_{2,i}) L_i d)$ for HoME. The proposed Mamba-HoME architecture demonstrates state-of-the-art segmentation performance across CT, MRI, and US datasets, with strong generalization and favorable memory characteristics, especially when pretrained in a supervised manner. This work advances practical, scalable 3D segmentation by combining hierarchical local processing with global context refinement, offering benefits for multi-modal clinical workflows and potentially informing broader hierarchical data modeling approaches.
Abstract
In recent years, artificial intelligence has significantly advanced medical image segmentation. Nonetheless, challenges remain, including efficient 3D medical image processing across diverse modalities and handling data variability. In this work, we introduce Hierarchical Soft Mixture-of-Experts (HoME), a two-level token-routing layer for efficient long-context modeling, specifically designed for 3D medical image segmentation. Built on the Mamba Selective State Space Model (SSM) backbone, HoME enhances sequential modeling through adaptive expert routing. In the first level, a Soft Mixture-of-Experts (SMoE) layer partitions input sequences into local groups, routing tokens to specialized per-group experts for localized feature extraction. The second level aggregates these outputs through a global SMoE layer, enabling cross-group information fusion and global context refinement. This hierarchical design, combining local expert routing with global expert refinement, enhances generalizability and segmentation performance, surpassing state-of-the-art results across datasets from the three most widely used 3D medical imaging modalities and varying data qualities. The code is publicly available at https://github.com/gmum/MambaHoME.
