Table of Contents
Fetching ...

MSV-Mamba: A Multiscale Vision Mamba Network for Echocardiography Segmentation

Xiaoxian Yang, Qi Wang, Kaiqi Zhang, Ke Wei, Jun Lyu, Lingchao Chen

TL;DR

This work tackles the challenge of accurate echocardiography segmentation under noisy, low-resolution conditions by introducing MSV-Mamba, a U-shaped network that combines a cascaded residual encoder with a large-window Mamba-based decoder. Key innovations include the LMS decoder blocks for global context with linear-like complexity, a Multiscale Attention Aggregation module for robust multilayer feature fusion via dual spatial-channel attention, and hierarchical auxiliary losses to supervise learning across decoder layers. Empirical results on EchoNet-Dynamic and CAMUS show superior performance in left ventricular endocardium and epicardium segmentation, with notable robustness to noise and morphological variation. The proposed approach offers a practical path toward real-time, reliable automatic echocardiography analysis, with potential extension to 3D reconstruction in future work.

Abstract

Ultrasound imaging frequently encounters challenges, such as those related to elevated noise levels, diminished spatiotemporal resolution, and the complexity of anatomical structures. These factors significantly hinder the model's ability to accurately capture and analyze structural relationships and dynamic patterns across various regions of the heart. Mamba, an emerging model, is one of the most cutting-edge approaches that is widely applied to diverse vision and language tasks. To this end, this paper introduces a U-shaped deep learning model incorporating a large-window Mamba scale (LMS) module and a hierarchical feature fusion approach for echocardiographic segmentation. First, a cascaded residual block serves as an encoder and is employed to incrementally extract multiscale detailed features. Second, a large-window multiscale mamba module is integrated into the decoder to capture global dependencies across regions and enhance the segmentation capability for complex anatomical structures. Furthermore, our model introduces auxiliary losses at each decoder layer and employs a dual attention mechanism to fuse multilayer features both spatially and across channels. This approach enhances segmentation performance and accuracy in delineating complex anatomical structures. Finally, the experimental results using the EchoNet-Dynamic and CAMUS datasets demonstrate that the model outperforms other methods in terms of both accuracy and robustness. For the segmentation of the left ventricular endocardium (${LV}_{endo}$), the model achieved optimal values of 95.01 and 93.36, respectively, while for the left ventricular epicardium (${LV}_{epi}$), values of 87.35 and 87.80, respectively, were achieved. This represents an improvement ranging between 0.54 and 1.11 compared with the best-performing model.

MSV-Mamba: A Multiscale Vision Mamba Network for Echocardiography Segmentation

TL;DR

This work tackles the challenge of accurate echocardiography segmentation under noisy, low-resolution conditions by introducing MSV-Mamba, a U-shaped network that combines a cascaded residual encoder with a large-window Mamba-based decoder. Key innovations include the LMS decoder blocks for global context with linear-like complexity, a Multiscale Attention Aggregation module for robust multilayer feature fusion via dual spatial-channel attention, and hierarchical auxiliary losses to supervise learning across decoder layers. Empirical results on EchoNet-Dynamic and CAMUS show superior performance in left ventricular endocardium and epicardium segmentation, with notable robustness to noise and morphological variation. The proposed approach offers a practical path toward real-time, reliable automatic echocardiography analysis, with potential extension to 3D reconstruction in future work.

Abstract

Ultrasound imaging frequently encounters challenges, such as those related to elevated noise levels, diminished spatiotemporal resolution, and the complexity of anatomical structures. These factors significantly hinder the model's ability to accurately capture and analyze structural relationships and dynamic patterns across various regions of the heart. Mamba, an emerging model, is one of the most cutting-edge approaches that is widely applied to diverse vision and language tasks. To this end, this paper introduces a U-shaped deep learning model incorporating a large-window Mamba scale (LMS) module and a hierarchical feature fusion approach for echocardiographic segmentation. First, a cascaded residual block serves as an encoder and is employed to incrementally extract multiscale detailed features. Second, a large-window multiscale mamba module is integrated into the decoder to capture global dependencies across regions and enhance the segmentation capability for complex anatomical structures. Furthermore, our model introduces auxiliary losses at each decoder layer and employs a dual attention mechanism to fuse multilayer features both spatially and across channels. This approach enhances segmentation performance and accuracy in delineating complex anatomical structures. Finally, the experimental results using the EchoNet-Dynamic and CAMUS datasets demonstrate that the model outperforms other methods in terms of both accuracy and robustness. For the segmentation of the left ventricular endocardium (), the model achieved optimal values of 95.01 and 93.36, respectively, while for the left ventricular epicardium (), values of 87.35 and 87.80, respectively, were achieved. This represents an improvement ranging between 0.54 and 1.11 compared with the best-performing model.
Paper Structure (13 sections, 14 equations, 5 figures, 3 tables)

This paper contains 13 sections, 14 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Overall structure of the MSV-Mamba model.
  • Figure 2: Residual block structure diagram.
  • Figure 3: LMS block structure diagram.
  • Figure 4: MSAA module diagram.
  • Figure 5: Visual comparison of the segmentation results for the CAMUS dataset.