Global and Local Mamba Network for Multi-Modality Medical Image Super-Resolution
Zexin Ji, Beiji Zou, Xiaoyan Kui, Sebastien Thureau, Su Ruan
TL;DR
The paper addresses the challenge of efficiently achieving high-quality multi-modality medical image super-resolution by balancing global context and local detail. It introduces GLMamba, a two-branch architecture with a global Mamba for low-resolution inputs and a local Mamba for high-resolution references, augmented by deformable and modulatory blocks and a dedicated multi-modality fusion block, coupled with a contrastive edge loss. Empirical results on BraTS2021 and IXI show improvements in PSNR, SSIM, and downstream segmentation Dice, along with competitive parameter efficiency compared with state-of-the-art methods. The approach holds practical potential for faster, more accurate clinical MR SR and downstream tasks, while future work includes extending to 3D data and integrating alignment with SR in a joint framework.
Abstract
Convolutional neural networks and Transformer have made significant progresses in multi-modality medical image super-resolution. However, these methods either have a fixed receptive field for local learning or significant computational burdens for global learning, limiting the super-resolution performance. To solve this problem, State Space Models, notably Mamba, is introduced to efficiently model long-range dependencies in images with linear computational complexity. Relying on the Mamba and the fact that low-resolution images rely on global information to compensate for missing details, while high-resolution reference images need to provide more local details for accurate super-resolution, we propose a global and local Mamba network (GLMamba) for multi-modality medical image super-resolution. To be specific, our GLMamba is a two-branch network equipped with a global Mamba branch and a local Mamba branch. The global Mamba branch captures long-range relationships in low-resolution inputs, and the local Mamba branch focuses more on short-range details in high-resolution reference images. We also use the deform block to adaptively extract features of both branches to enhance the representation ability. A modulator is designed to further enhance deformable features in both global and local Mamba blocks. To fully integrate the reference image for low-resolution image super-resolution, we further develop a multi-modality feature fusion block to adaptively fuse features by considering similarities, differences, and complementary aspects between modalities. In addition, a contrastive edge loss (CELoss) is developed for sufficient enhancement of edge textures and contrast in medical images.
