Table of Contents
Fetching ...

Self-Prior Guided Mamba-UNet Networks for Medical Image Super-Resolution

Zexin Ji, Beiji Zou, Xiaoyan Kui, Pierre Vera, Su Ruan

TL;DR

This work addresses medical image super-resolution by overcoming CNNs' local bias and Transformers' computational burden through a self-prior guided Mamba-UNet (SMamba-UNet). It combines a Mamba-based UNet with self-prior learning via brightness perturbation and an improved ISS2D module to capture multi-scale, multi-directional dependencies with linear complexity $O(L)$ in sequence length. The approach uses a composite loss comprising an $L_1$ term and a perceptual term based on a pre-trained VGG19 network, enabling texture and brightness refinement in SR. Experiments on IXI and fastMRI demonstrate state-of-the-art PSNR/SSIM at 2× and 4× upsampling, highlighting efficient long-range modeling and effective self-exemplar learning for clinically relevant SR improvements.

Abstract

In this paper, we propose a self-prior guided Mamba-UNet network (SMamba-UNet) for medical image super-resolution. Existing methods are primarily based on convolutional neural networks (CNNs) or Transformers. CNNs-based methods fail to capture long-range dependencies, while Transformer-based approaches face heavy calculation challenges due to their quadratic computational complexity. Recently, State Space Models (SSMs) especially Mamba have emerged, capable of modeling long-range dependencies with linear computational complexity. Inspired by Mamba, our approach aims to learn the self-prior multi-scale contextual features under Mamba-UNet networks, which may help to super-resolve low-resolution medical images in an efficient way. Specifically, we obtain self-priors by perturbing the brightness inpainting of the input image during network training, which can learn detailed texture and brightness information that is beneficial for super-resolution. Furthermore, we combine Mamba with Unet network to mine global features at different levels. We also design an improved 2D-Selective-Scan (ISS2D) module to divide image features into different directional sequences to learn long-range dependencies in multiple directions, and adaptively fuse sequence information to enhance super-resolved feature representation. Both qualitative and quantitative experimental results demonstrate that our approach outperforms current state-of-the-art methods on two public medical datasets: the IXI and fastMRI.

Self-Prior Guided Mamba-UNet Networks for Medical Image Super-Resolution

TL;DR

This work addresses medical image super-resolution by overcoming CNNs' local bias and Transformers' computational burden through a self-prior guided Mamba-UNet (SMamba-UNet). It combines a Mamba-based UNet with self-prior learning via brightness perturbation and an improved ISS2D module to capture multi-scale, multi-directional dependencies with linear complexity in sequence length. The approach uses a composite loss comprising an term and a perceptual term based on a pre-trained VGG19 network, enabling texture and brightness refinement in SR. Experiments on IXI and fastMRI demonstrate state-of-the-art PSNR/SSIM at 2× and 4× upsampling, highlighting efficient long-range modeling and effective self-exemplar learning for clinically relevant SR improvements.

Abstract

In this paper, we propose a self-prior guided Mamba-UNet network (SMamba-UNet) for medical image super-resolution. Existing methods are primarily based on convolutional neural networks (CNNs) or Transformers. CNNs-based methods fail to capture long-range dependencies, while Transformer-based approaches face heavy calculation challenges due to their quadratic computational complexity. Recently, State Space Models (SSMs) especially Mamba have emerged, capable of modeling long-range dependencies with linear computational complexity. Inspired by Mamba, our approach aims to learn the self-prior multi-scale contextual features under Mamba-UNet networks, which may help to super-resolve low-resolution medical images in an efficient way. Specifically, we obtain self-priors by perturbing the brightness inpainting of the input image during network training, which can learn detailed texture and brightness information that is beneficial for super-resolution. Furthermore, we combine Mamba with Unet network to mine global features at different levels. We also design an improved 2D-Selective-Scan (ISS2D) module to divide image features into different directional sequences to learn long-range dependencies in multiple directions, and adaptively fuse sequence information to enhance super-resolved feature representation. Both qualitative and quantitative experimental results demonstrate that our approach outperforms current state-of-the-art methods on two public medical datasets: the IXI and fastMRI.
Paper Structure (17 sections, 6 equations, 5 figures, 5 tables)

This paper contains 17 sections, 6 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Our SMamba-Unet framework primarily comprising a patch embedding layer, a Mamba-based encoder module, a Mamba-based decoder module, and a final projection layer.
  • Figure 2: The framework of the vision Mamba module and improved 2D selective scan (ISS2D) module.
  • Figure 3: Qualitative results on fastMRI and IXI dataset under 2$\times$ upsampling factor. The significant differences between different methods are shown by the yellow arrow.
  • Figure 4: Qualitative results on fastMRI and IXI dataset under 4$\times$ upsampling factor. The significant differences between different methods are shown by the yellow arrow.
  • Figure 5: Ablation study with different patch sizes in vision Mamba module.