Table of Contents
Fetching ...

GAMA-IR: Global Additive Multidimensional Averaging for Fast Image Restoration

Youssef Mansour, Reinhard Heckel

TL;DR

GAMA-IR addresses the need for fast, memory-efficient image restoration without sacrificing quality. It introduces the GAMA block, which captures global context via global averaging across all dimensions and a trio of lightweight 7×7 convolutions, enabling large receptive fields in a shallow network. The encoder–decoder architecture with skip connections and 1×1 down/upsampling sustains high performance while reducing latency and memory use. Across real-world denoising (SIDD), deblurring, deraining, and Gaussian denoising, GAMA-IR achieves competitive or superior PSNR/SSIM with substantially lower latency and memory on GPUs, notably surpassing Restormer and NAFNet on SIDD by about 0.11 dB while being 2–10× faster. The work highlights that optimizing for GPU-centric metrics (latency and memory) yields practical speedups without compromising restoration quality, making it appealing for real-time or resource-constrained deployments.

Abstract

Deep learning-based methods have shown remarkable success for various image restoration tasks such as denoising and deblurring. The current state-of-the-art networks are relatively deep and utilize (variants of) self attention mechanisms. Those networks are significantly slower than shallow convolutional networks, which however perform worse. In this paper, we introduce an image restoration network that is both fast and yields excellent image quality. The network is designed to minimize the latency and memory consumption when executed on a standard GPU, while maintaining state-of-the-art performance. The network is a simple shallow network with an efficient block that implements global additive multidimensional averaging operations. This block can capture global information and enable a large receptive field even when used in shallow networks with minimal computational overhead. Through extensive experiments and evaluations on diverse tasks, we demonstrate that our network achieves comparable or even superior results to existing state-of-the-art image restoration networks with less latency. For instance, we exceed the state-of-the-art result on real-world SIDD denoising by 0.11dB, while being 2 to 10 times faster.

GAMA-IR: Global Additive Multidimensional Averaging for Fast Image Restoration

TL;DR

GAMA-IR addresses the need for fast, memory-efficient image restoration without sacrificing quality. It introduces the GAMA block, which captures global context via global averaging across all dimensions and a trio of lightweight 7×7 convolutions, enabling large receptive fields in a shallow network. The encoder–decoder architecture with skip connections and 1×1 down/upsampling sustains high performance while reducing latency and memory use. Across real-world denoising (SIDD), deblurring, deraining, and Gaussian denoising, GAMA-IR achieves competitive or superior PSNR/SSIM with substantially lower latency and memory on GPUs, notably surpassing Restormer and NAFNet on SIDD by about 0.11 dB while being 2–10× faster. The work highlights that optimizing for GPU-centric metrics (latency and memory) yields practical speedups without compromising restoration quality, making it appealing for real-time or resource-constrained deployments.

Abstract

Deep learning-based methods have shown remarkable success for various image restoration tasks such as denoising and deblurring. The current state-of-the-art networks are relatively deep and utilize (variants of) self attention mechanisms. Those networks are significantly slower than shallow convolutional networks, which however perform worse. In this paper, we introduce an image restoration network that is both fast and yields excellent image quality. The network is designed to minimize the latency and memory consumption when executed on a standard GPU, while maintaining state-of-the-art performance. The network is a simple shallow network with an efficient block that implements global additive multidimensional averaging operations. This block can capture global information and enable a large receptive field even when used in shallow networks with minimal computational overhead. Through extensive experiments and evaluations on diverse tasks, we demonstrate that our network achieves comparable or even superior results to existing state-of-the-art image restoration networks with less latency. For instance, we exceed the state-of-the-art result on real-world SIDD denoising by 0.11dB, while being 2 to 10 times faster.
Paper Structure (14 sections, 8 figures, 5 tables)

This paper contains 14 sections, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Performance and speed of our GAMA-IR network in comparison to other popular and state-of-the-art networks on two common restoration tasks: Image denoising and deblurring. GAMA-IR achieves slightly better performance than state of the art networks, while being significantly faster.
  • Figure 2: Correlation of FLOPs and parameter count with latency and memory of image restoration algorithms run on a NVIDIA384RTX A600 GPU. Latency and memory are measured as the time and memory required for a forward pass through the network at inference. All metrics are considered for an image input size of $1\times 3 \times 256 \times 256$, which is a single 3-channel RGB image (i.e., the batch size is one). The networks' parameters are varied to create different sizes of the same network. Note the moderate to weak correlation between FLOPS and Parameters with Memory and Latency.
  • Figure 3: The architecture of our proposed network. Similar to the UNet, the network has an encoder-decoder structure with skip connections adding the feature maps of equal resolution elementwise. The multi-resolution layout is achieved by downsampling and upsampling operations. The network has only 2 hyperparameters, the depth and width. The depth is the total number of building blocks, and the width is the number of channels $C$.
  • Figure 4: a) Non-Local Block nonlocal_blocks. b) Squeeze-and-Excite Block squeeze_excite c) Our proposed GAMA Block. $\otimes$ denotes matrix multiplication, $\oplus$ and $\hbox{\textcircled{{*}}}$ denote element wise addition and multiplication, respectively. The circular arrows in c) denote a transpose operation. The blocks aid in capturing long-range dependencies in data. Non-local blocks calculate interactions between all pixels (matrix multiplications), while the other blocks rely on global averaging to summarize the entire feature map across a specific dimension.
  • Figure 5: The ability of the underparameterized network illustrated above with different choices for the block to memorize an image with more pixels than it has parameters. The block in the network is either Squeeze&Excite, or GAMA, or identity in case of the Plain CNN. The total number of parameters is the same for each network variant. The chess board has a repetitive pattern that the networks with the larger receptive fields can exploit, whereas the baboon has less such structure, which is why we see similar denoising performance.
  • ...and 3 more figures