Table of Contents
Fetching ...

CascadedGaze: Efficiency in Global Context Extraction for Image Restoration

Amirhosein Ghasemabadi, Muhammad Kamran Janjua, Mohammad Salameh, Chunhua Zhou, Fengyu Sun, Di Niu

TL;DR

This work tackles the challenge of incorporating global context in image restoration without the heavy costs of self-attention. It introduces CascadedGaze Network (CGNet), a fully convolutional encoder–decoder that employs a Global Context Extractor (GCE) to learn local and global dependencies through cascaded small-kernel depthwise convolutions, followed by a Range Fuser that combines contexts with channel attention. CGNet achieves state-of-the-art efficiency and competitive or superior PSNR across real denoising (SIDD), Gaussian denoising (multiple datasets), and single-image deblurring (GoPro), while reducing MACs and inference time relative to Transformer-based methods. Ablation studies validate channel merging, kernel-size progression, GCE placement, and convolutional design as key factors for performance and efficiency. The approach demonstrates a practical, scalable path to global-context learning in low-level vision tasks with broad potential for further extension.

Abstract

Image restoration tasks traditionally rely on convolutional neural networks. However, given the local nature of the convolutional operator, they struggle to capture global information. The promise of attention mechanisms in Transformers is to circumvent this problem, but it comes at the cost of intensive computational overhead. Many recent studies in image restoration have focused on solving the challenge of balancing performance and computational cost via Transformer variants. In this paper, we present CascadedGaze Network (CGNet), an encoder-decoder architecture that employs Global Context Extractor (GCE), a novel and efficient way to capture global information for image restoration. The GCE module leverages small kernels across convolutional layers to learn global dependencies, without requiring self-attention. Extensive experimental results show that our computationally efficient approach performs competitively to a range of state-of-the-art methods on synthetic image denoising and single image deblurring tasks, and pushes the performance boundary further on the real image denoising task.

CascadedGaze: Efficiency in Global Context Extraction for Image Restoration

TL;DR

This work tackles the challenge of incorporating global context in image restoration without the heavy costs of self-attention. It introduces CascadedGaze Network (CGNet), a fully convolutional encoder–decoder that employs a Global Context Extractor (GCE) to learn local and global dependencies through cascaded small-kernel depthwise convolutions, followed by a Range Fuser that combines contexts with channel attention. CGNet achieves state-of-the-art efficiency and competitive or superior PSNR across real denoising (SIDD), Gaussian denoising (multiple datasets), and single-image deblurring (GoPro), while reducing MACs and inference time relative to Transformer-based methods. Ablation studies validate channel merging, kernel-size progression, GCE placement, and convolutional design as key factors for performance and efficiency. The approach demonstrates a practical, scalable path to global-context learning in low-level vision tasks with broad potential for further extension.

Abstract

Image restoration tasks traditionally rely on convolutional neural networks. However, given the local nature of the convolutional operator, they struggle to capture global information. The promise of attention mechanisms in Transformers is to circumvent this problem, but it comes at the cost of intensive computational overhead. Many recent studies in image restoration have focused on solving the challenge of balancing performance and computational cost via Transformer variants. In this paper, we present CascadedGaze Network (CGNet), an encoder-decoder architecture that employs Global Context Extractor (GCE), a novel and efficient way to capture global information for image restoration. The GCE module leverages small kernels across convolutional layers to learn global dependencies, without requiring self-attention. Extensive experimental results show that our computationally efficient approach performs competitively to a range of state-of-the-art methods on synthetic image denoising and single image deblurring tasks, and pushes the performance boundary further on the real image denoising task.
Paper Structure (36 sections, 2 equations, 9 figures, 11 tables)

This paper contains 36 sections, 2 equations, 9 figures, 11 tables.

Figures (9)

  • Figure 1: Computational Efficiency vs Performance. Left: PSNR vs. MACs (G) comparison on SIDD real image denoising. Right: PSNR vs. MACs (G) comparison on Gaussian image denoising tested on Kodak24 dataset with noise level $\sigma = 50$. Our model achieves state-of-the-art results and is computationally efficient.
  • Figure 2: Architecture Diagram. (a) Illustration of the overall architecture of CascadedGaze network (CGNet). Each encoder layer comprises $N_g \times$ CascadedGaze blocks. (b) The CascadedGaze blocks are composed of (c) GCE module and (d) Range Fuser. GCE Module has three depthwise convolutions, followed by pointwise convolutions and GELU.
  • Figure 3: GCE module. We visualize the depthwise separable convolution layers to elucidate the capturing of context at different levels. The spatial range of each convolution is depicted in the input feature block with their corresponding colors.
  • Figure 4: Block Comparison. CacadedGaze Block and NAF Block comparison diagram.
  • Figure 5: Qualitative Comparison on Gaussian Denoising. Visual results on Gaussian image denoising on Kodak24 franzen1999kodak dataset. We compare with Restormer res:2, the best method in the literature on the dataset. Our method, CGNet, restores finer details and pleasing outputs. The corresponding PSNR scores for each image are mentioned at the top of the figure.
  • ...and 4 more figures