Table of Contents
Fetching ...

Hi-Mamba: Hierarchical Mamba for Efficient Image Super-Resolution

Junbo Qiao, Jincheng Liao, Wei Li, Yulun Zhang, Yong Guo, Yi Wen, Zhangxizi Qiu, Jiao Xie, Jie Hu, Shaohui Lin

TL;DR

A novel Hierarchical Mamba network, namely, Hi-Mamba, for image super-resolution (SR), which achieves a significant PSNR improvement of 0.29 dB on Manga109 for $\times3$ SR, compared to the strong lightweight MambaIR.

Abstract

State Space Models (SSM), such as Mamba, have shown strong representation ability in modeling long-range dependency with linear complexity, achieving successful applications from high-level to low-level vision tasks. However, SSM's sequential nature necessitates multiple scans in different directions to compensate for the loss of spatial dependency when unfolding the image into a 1D sequence. This multi-direction scanning strategy significantly increases the computation overhead and is unbearable for high-resolution image processing. To address this problem, we propose a novel Hierarchical Mamba network, namely, Hi-Mamba, for image super-resolution (SR). Hi-Mamba consists of two key designs: (1) The Hierarchical Mamba Block (HMB) assembled by a Local SSM (L-SSM) and a Region SSM (R-SSM) both with the single-direction scanning, aggregates multi-scale representations to enhance the context modeling ability. (2) The Direction Alternation Hierarchical Mamba Group (DA-HMG) allocates the isomeric single-direction scanning into cascading HMBs to enrich the spatial relationship modeling. Extensive experiments demonstrate the superiority of Hi-Mamba across five benchmark datasets for efficient SR. For example, Hi-Mamba achieves a significant PSNR improvement of 0.29 dB on Manga109 for $\times3$ SR, compared to the strong lightweight MambaIR.

Hi-Mamba: Hierarchical Mamba for Efficient Image Super-Resolution

TL;DR

A novel Hierarchical Mamba network, namely, Hi-Mamba, for image super-resolution (SR), which achieves a significant PSNR improvement of 0.29 dB on Manga109 for SR, compared to the strong lightweight MambaIR.

Abstract

State Space Models (SSM), such as Mamba, have shown strong representation ability in modeling long-range dependency with linear complexity, achieving successful applications from high-level to low-level vision tasks. However, SSM's sequential nature necessitates multiple scans in different directions to compensate for the loss of spatial dependency when unfolding the image into a 1D sequence. This multi-direction scanning strategy significantly increases the computation overhead and is unbearable for high-resolution image processing. To address this problem, we propose a novel Hierarchical Mamba network, namely, Hi-Mamba, for image super-resolution (SR). Hi-Mamba consists of two key designs: (1) The Hierarchical Mamba Block (HMB) assembled by a Local SSM (L-SSM) and a Region SSM (R-SSM) both with the single-direction scanning, aggregates multi-scale representations to enhance the context modeling ability. (2) The Direction Alternation Hierarchical Mamba Group (DA-HMG) allocates the isomeric single-direction scanning into cascading HMBs to enrich the spatial relationship modeling. Extensive experiments demonstrate the superiority of Hi-Mamba across five benchmark datasets for efficient SR. For example, Hi-Mamba achieves a significant PSNR improvement of 0.29 dB on Manga109 for SR, compared to the strong lightweight MambaIR.

Paper Structure

This paper contains 17 sections, 12 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Illustration of the proposed Hi-Mamba. (a) The overview of Hi-Mamba architecture with $N_2$ Hierarchical Mamba Groups (DA-HMG), where each DA-HMG contains the number of $N_1$ Hierarchical Mamba blocks (HMB), which consist of four isomeric single-direction scanning SSM denoted by HMB-H/V/RH/RV. (b) Hierarchical Mamba Block (HMB) consists of a Local-SSM, a Region-SSM, and a Gate Feed-Forward Network (G-FFN).
  • Figure 2: Illustration of the key components in HMB.
  • Figure 3: Qualitative comparison on the "img004" image of Urban100 for $\times$4 SR.
  • Figure 4: LAM visualization gu2021interpreting on $\times2$ SR task. LAM indicates the correlation between the significance of each pixel in LR and the SR patch outlined with the red box. Hi-Mamba utilizes a broader range of information to obtain better performance.
  • Figure 5: Performance on Urban100 for $\times 2$ SR. The larger circles present larger computation costs on Params.
  • ...and 5 more figures