Multi-Scale Representation Learning for Image Restoration with State-Space Model
Yuhong He, Long Peng, Qiaosi Yi, Chen Wu, Lu Wang
TL;DR
The paper tackles real-world image restoration under multi-scale degradations by introducing MS-Mamba, an efficient multi-scale state-space modeling framework embedded in a UNet backbone. It blends global and regional state-space modules (GSSM and RSSM) in a Hierarchical Mamba Block to capture global, regional, and local features, and augments detail extraction with Adaptive Gradient Block and Residual Fourier Block, trained with a composite loss $\ abla L_{total}= \lambda_1 \mathcal{L}_1 + \lambda_2 \mathcal{L}_{edge} + \lambda_3 \mathcal{L}_{fft}$. The approach achieves state-of-the-art results across nine public benchmarks and four restoration tasks (deraining, dehazing, denoising, low-light enhancement) while maintaining lower computational costs than Transformer-heavy methods. This combination of global/regional multi-scale SSMs with frequency- and gradient-based detail modeling yields practical improvements for real-world image restoration, demonstrated by quantitative gains and a user study indicating strong perceptual quality.
Abstract
Image restoration endeavors to reconstruct a high-quality, detail-rich image from a degraded counterpart, which is a pivotal process in photography and various computer vision systems. In real-world scenarios, different types of degradation can cause the loss of image details at various scales and degrade image contrast. Existing methods predominantly rely on CNN and Transformer to capture multi-scale representations. However, these methods are often limited by the high computational complexity of Transformers and the constrained receptive field of CNN, which hinder them from achieving superior performance and efficiency in image restoration. To address these challenges, we propose a novel Multi-Scale State-Space Model-based (MS-Mamba) for efficient image restoration that enhances the capacity for multi-scale representation learning through our proposed global and regional SSM modules. Additionally, an Adaptive Gradient Block (AGB) and a Residual Fourier Block (RFB) are proposed to improve the network's detail extraction capabilities by capturing gradients in various directions and facilitating learning details in the frequency domain. Extensive experiments on nine public benchmarks across four classic image restoration tasks, image deraining, dehazing, denoising, and low-light enhancement, demonstrate that our proposed method achieves new state-of-the-art performance while maintaining low computational complexity. The source code will be publicly available.
