Table of Contents
Fetching ...

A Mountain-Shaped Single-Stage Network for Accurate Image Restoration

Hu Gao, Jing Yang, Ying Zhang, Ning Wang, Jingfan Yang, Depeng Dang

TL;DR

The paper tackles the challenge of balancing spatial detail and contextual information in image restoration while keeping computational costs low. It introduces M3SNet, a mountain-shaped single-stage network based on a U‑Net backbone, augmented with a feature fusion middleware (FFM) and a multi-head attention middle block (MHAMB) to enable cross-scale and global information exchange within one pass. By employing activation-free blocks (NAFBlock) and residual fusion, M3SNet achieves state-of-the-art results on deraining and deblurring tasks across six datasets with substantially reduced MACs. The approach demonstrates strong generalization (e.g., GoPro to HIDE) and provides a practical, efficient solution for high-quality image restoration.

Abstract

Image restoration is the task of aiming to obtain a high-quality image from a corrupt input image, such as deblurring and deraining. In image restoration, it is typically necessary to maintain a complex balance between spatial details and contextual information. Although a multi-stage network can optimally balance these competing goals and achieve significant performance, this also increases the system's complexity. In this paper, we propose a mountain-shaped single-stage design base on a simple U-Net architecture, which removes or replaces unnecessary nonlinear activation functions to achieve the above balance with low system complexity. Specifically, we propose a feature fusion middleware (FFM) mechanism as an information exchange component between the encoder-decoder architectural levels. It seamlessly integrates upper-layer information into the adjacent lower layer, sequentially down to the lowest layer. Finally, all information is fused into the original image resolution manipulation level. This preserves spatial details and integrates contextual information, ensuring high-quality image restoration. In addition, we propose a multi-head attention middle block (MHAMB) as a bridge between the encoder and decoder to capture more global information and surpass the limitations of the receptive field of CNNs. Extensive experiments demonstrate that our approach, named as M3SNet, outperforms previous state-of-the-art models while using less than half the computational costs, for several image restoration tasks, such as image deraining and deblurring.

A Mountain-Shaped Single-Stage Network for Accurate Image Restoration

TL;DR

The paper tackles the challenge of balancing spatial detail and contextual information in image restoration while keeping computational costs low. It introduces M3SNet, a mountain-shaped single-stage network based on a U‑Net backbone, augmented with a feature fusion middleware (FFM) and a multi-head attention middle block (MHAMB) to enable cross-scale and global information exchange within one pass. By employing activation-free blocks (NAFBlock) and residual fusion, M3SNet achieves state-of-the-art results on deraining and deblurring tasks across six datasets with substantially reduced MACs. The approach demonstrates strong generalization (e.g., GoPro to HIDE) and provides a practical, efficient solution for high-quality image restoration.

Abstract

Image restoration is the task of aiming to obtain a high-quality image from a corrupt input image, such as deblurring and deraining. In image restoration, it is typically necessary to maintain a complex balance between spatial details and contextual information. Although a multi-stage network can optimally balance these competing goals and achieve significant performance, this also increases the system's complexity. In this paper, we propose a mountain-shaped single-stage design base on a simple U-Net architecture, which removes or replaces unnecessary nonlinear activation functions to achieve the above balance with low system complexity. Specifically, we propose a feature fusion middleware (FFM) mechanism as an information exchange component between the encoder-decoder architectural levels. It seamlessly integrates upper-layer information into the adjacent lower layer, sequentially down to the lowest layer. Finally, all information is fused into the original image resolution manipulation level. This preserves spatial details and integrates contextual information, ensuring high-quality image restoration. In addition, we propose a multi-head attention middle block (MHAMB) as a bridge between the encoder and decoder to capture more global information and surpass the limitations of the receptive field of CNNs. Extensive experiments demonstrate that our approach, named as M3SNet, outperforms previous state-of-the-art models while using less than half the computational costs, for several image restoration tasks, such as image deraining and deblurring.
Paper Structure (15 sections, 15 equations, 7 figures, 5 tables)

This paper contains 15 sections, 15 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Visualized results of M3SNet on various image restoration tasks. Left: degraded image. Right: the predicted result of M3SNet. From top to bottom: image deblurring, and image deraining task respectively.
  • Figure 2: PSNR vs. computational cost on Image Deblurring. Under different parameter capacities, our model achieves state-of-the-art. In addition, our model involves only a relatively small number of multiply-accumulate operations (MACs).
  • Figure 3: Architecture of M3SNet for image restoration.
  • Figure 4: (a) Feature fusion middleware (FFM) that enables the exchange of information across multiple scales while preserving the fine details. (b) The architecture of nonlinear activation free block (NAFBlock). (c) Simplified Channel Attention (SCA). (d) Multi-head attention middle block (MHAMB) that captures more global information.
  • Figure 5: Visualized results of M3SNet on various image restoration tasks. For each image pair, the upper one is degraded and the down one is predicted by M3SNet.
  • ...and 2 more figures