UNet with Self-Adaptive Mamba-Like Attention and Causal-Resonance Learning for Medical Image Segmentation
Saqib Qamar, Mohd Fazil, Parvez Ahmad, Shakir Khan, Abu Taha Zamani
TL;DR
This work tackles the trade-off between efficiency and accuracy in medical image segmentation by introducing SAMA-UNet, which combines Self-Adaptive Mamba-like Aggregated Attention (SAMA) with a Causal-Resonance Multi-Scale Module (CR-MSM). SAMA provides a dynamic, channel-split attention mechanism that fuses local and global features with linear complexity, while CR-MSM preserves causal, multi-scale information flow during encoder–decoder fusion. Across MRI, CT, and endoscopy datasets, SAMA-UNet achieves state-of-the-art or competitive DSC and NSD scores with lower computational overhead than many transformer- and Mamba-based baselines. The work demonstrates both technical and clinical relevance by delivering high segmentation accuracy with scalable efficiency, and it outlines concrete future directions for memory-efficient 3D extensions and real-time deployment.
Abstract
Medical image segmentation plays an important role in various clinical applications; however, existing deep learning models face trade-offs between efficiency and accuracy. Convolutional Neural Networks (CNNs) capture local details well but miss the global context, whereas transformers handle the global context but at a high computational cost. Recently, State Space Sequence Models (SSMs) have shown potential for capturing long-range dependencies with linear complexity; however, their direct use in medical image segmentation remains limited due to incompatibility with image structures and autoregressive assumptions. To overcome these challenges, we propose SAMA-UNet, a novel U-shaped architecture that introduces two key innovations. First, the Self-Adaptive Mamba-like Aggregated Attention (SAMA) block adaptively integrates local and global features through dynamic attention weighting, enabling an efficient representation of complex anatomical patterns. Second, the causal resonance multi-scale module (CR-MSM) improves encoder-decoder interactions by adjusting feature resolution and causal dependencies across scales, enhancing the semantic alignment between low- and high-level features. Extensive experiments on MRI, CT, and endoscopy datasets demonstrate that SAMA-UNet consistently outperforms CNN, Transformer, and Mamba-based methods. It achieves 85.38% DSC and 87.82% NSD on BTCV, 92.16% and 96.54% on ACDC, 67.14% and 68.70% on EndoVis17, and 84.06% and 88.47% on ATLAS23, establishing new benchmarks across modalities. These results confirm the effectiveness of SAMA-UNet in combining efficiency and accuracy, making it a promising solution for real-world clinical segmentation tasks. The source code is available on GitHub.
