Self-Prior Guided Mamba-UNet Networks for Medical Image Super-Resolution
Zexin Ji, Beiji Zou, Xiaoyan Kui, Pierre Vera, Su Ruan
TL;DR
This work addresses medical image super-resolution by overcoming CNNs' local bias and Transformers' computational burden through a self-prior guided Mamba-UNet (SMamba-UNet). It combines a Mamba-based UNet with self-prior learning via brightness perturbation and an improved ISS2D module to capture multi-scale, multi-directional dependencies with linear complexity $O(L)$ in sequence length. The approach uses a composite loss comprising an $L_1$ term and a perceptual term based on a pre-trained VGG19 network, enabling texture and brightness refinement in SR. Experiments on IXI and fastMRI demonstrate state-of-the-art PSNR/SSIM at 2× and 4× upsampling, highlighting efficient long-range modeling and effective self-exemplar learning for clinically relevant SR improvements.
Abstract
In this paper, we propose a self-prior guided Mamba-UNet network (SMamba-UNet) for medical image super-resolution. Existing methods are primarily based on convolutional neural networks (CNNs) or Transformers. CNNs-based methods fail to capture long-range dependencies, while Transformer-based approaches face heavy calculation challenges due to their quadratic computational complexity. Recently, State Space Models (SSMs) especially Mamba have emerged, capable of modeling long-range dependencies with linear computational complexity. Inspired by Mamba, our approach aims to learn the self-prior multi-scale contextual features under Mamba-UNet networks, which may help to super-resolve low-resolution medical images in an efficient way. Specifically, we obtain self-priors by perturbing the brightness inpainting of the input image during network training, which can learn detailed texture and brightness information that is beneficial for super-resolution. Furthermore, we combine Mamba with Unet network to mine global features at different levels. We also design an improved 2D-Selective-Scan (ISS2D) module to divide image features into different directional sequences to learn long-range dependencies in multiple directions, and adaptively fuse sequence information to enhance super-resolved feature representation. Both qualitative and quantitative experimental results demonstrate that our approach outperforms current state-of-the-art methods on two public medical datasets: the IXI and fastMRI.
