Table of Contents
Fetching ...

LKM-UNet: Large Kernel Vision Mamba UNet for Medical Image Segmentation

Jinhong Wang, Jintai Chen, Danny Chen, Jian Wu

TL;DR

This paper addresses the challenge of efficiently achieving large receptive fields for medical image segmentation without the quadratic costs of self-attention. It introduces LKM-UNet, a UNet-inspired architecture that leverages large-kernel Mamba-based LM blocks, combining pixel-level and patch-level SSMs in a bidirectional, hierarchical design to model both local and global dependencies with linear complexity. Through extensive experiments on Abdomen CT (3D) and Abdomen MR (2D) datasets, LKM-UNet outperforms CNN-, Transformer-, and U-Mamba-based baselines, and ablation studies confirm the value of PiM, PaM, and BiM components as well as larger kernel sizes. The work demonstrates the practical significance of large receptive fields for accurate organ segmentation and provides public code to foster further research.

Abstract

In clinical practice, medical image segmentation provides useful information on the contours and dimensions of target organs or tissues, facilitating improved diagnosis, analysis, and treatment. In the past few years, convolutional neural networks (CNNs) and Transformers have dominated this area, but they still suffer from either limited receptive fields or costly long-range modeling. Mamba, a State Space Sequence Model (SSM), recently emerged as a promising paradigm for long-range dependency modeling with linear complexity. In this paper, we introduce a Large Kernel Vision Mamba U-shape Network, or LKM-UNet, for medical image segmentation. A distinguishing feature of our LKM-UNet is its utilization of large Mamba kernels, excelling in locally spatial modeling compared to small kernel-based CNNs and Transformers, while maintaining superior efficiency in global modeling compared to self-attention with quadratic complexity. Additionally, we design a novel hierarchical and bidirectional Mamba block to further enhance Mamba's global and neighborhood spatial modeling capability for vision inputs. Comprehensive experiments demonstrate the feasibility and the effectiveness of using large-size Mamba kernels to achieve large receptive fields. Codes are available at https://github.com/wjh892521292/LKM-UNet.

LKM-UNet: Large Kernel Vision Mamba UNet for Medical Image Segmentation

TL;DR

This paper addresses the challenge of efficiently achieving large receptive fields for medical image segmentation without the quadratic costs of self-attention. It introduces LKM-UNet, a UNet-inspired architecture that leverages large-kernel Mamba-based LM blocks, combining pixel-level and patch-level SSMs in a bidirectional, hierarchical design to model both local and global dependencies with linear complexity. Through extensive experiments on Abdomen CT (3D) and Abdomen MR (2D) datasets, LKM-UNet outperforms CNN-, Transformer-, and U-Mamba-based baselines, and ablation studies confirm the value of PiM, PaM, and BiM components as well as larger kernel sizes. The work demonstrates the practical significance of large receptive fields for accurate organ segmentation and provides public code to foster further research.

Abstract

In clinical practice, medical image segmentation provides useful information on the contours and dimensions of target organs or tissues, facilitating improved diagnosis, analysis, and treatment. In the past few years, convolutional neural networks (CNNs) and Transformers have dominated this area, but they still suffer from either limited receptive fields or costly long-range modeling. Mamba, a State Space Sequence Model (SSM), recently emerged as a promising paradigm for long-range dependency modeling with linear complexity. In this paper, we introduce a Large Kernel Vision Mamba U-shape Network, or LKM-UNet, for medical image segmentation. A distinguishing feature of our LKM-UNet is its utilization of large Mamba kernels, excelling in locally spatial modeling compared to small kernel-based CNNs and Transformers, while maintaining superior efficiency in global modeling compared to self-attention with quadratic complexity. Additionally, we design a novel hierarchical and bidirectional Mamba block to further enhance Mamba's global and neighborhood spatial modeling capability for vision inputs. Comprehensive experiments demonstrate the feasibility and the effectiveness of using large-size Mamba kernels to achieve large receptive fields. Codes are available at https://github.com/wjh892521292/LKM-UNet.
Paper Structure (14 sections, 6 equations, 3 figures, 3 tables)

This paper contains 14 sections, 6 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: An overview of our proposed LKM-UNet.
  • Figure 2: (a) Respective field comparison among CNN, Transformer, and our proposed LKM-UNet. CNNs often use small kernels (like $3 \times 3$), and Transformers often use $7 \times 7$ sized kernels (windows). Our LKM-UNet can scale up kernel size to $40 \times 40$. (b) Scanning order comparison of vanilla Mamba vs. our proposed bidirectional Mamba.
  • Figure 3: Effective respective field visualization among CNN, Transformer, U-Mamba and our proposed LKM-UNet.