Table of Contents
Fetching ...

Ultra-High-Definition Image Deblurring via Multi-scale Cubic-Mixer

Xingchi Chen, Xiuyi Jia, Zhuoran Zheng

TL;DR

The paper tackles UHD image deblurring with real-time performance by introducing a frequency-domain, MLP-based framework called Multi-scale Cubic-Mixer. It processes the real and imaginary parts of FFT coefficients via Wave-Frequency Processing, uses a multi-scale cubic-mixer to model long-range dependencies in the frequency domain, and employs a slicing scheme to reconstruct arbitrary-size UHD images with high fidelity. The approach achieves competitive or superior PSNR/SSIM on 4KRD and other benchmarks while delivering real-time throughput on a single GPU, outperforming several state-of-the-art methods in UHD settings. This method offers practical impact for on-device UHD deblurring in consumer devices and streaming applications, with potential extensions to related restoration tasks and downstream vision systems.

Abstract

Currently, transformer-based algorithms are making a splash in the domain of image deblurring. Their achievement depends on the self-attention mechanism with CNN stem to model long range dependencies between tokens. Unfortunately, this ear-pleasing pipeline introduces high computational complexity and makes it difficult to run an ultra-high-definition image on a single GPU in real time. To trade-off accuracy and efficiency, the input degraded image is computed cyclically over three dimensional ($C$, $W$, and $H$) signals without a self-attention mechanism. We term this deep network as Multi-scale Cubic-Mixer, which is acted on both the real and imaginary components after fast Fourier transform to estimate the Fourier coefficients and thus obtain a deblurred image. Furthermore, we combine the multi-scale cubic-mixer with a slicing strategy to generate high-quality results at a much lower computational cost. Experimental results demonstrate that the proposed algorithm performs favorably against the state-of-the-art deblurring approaches on the several benchmarks and a new ultra-high-definition dataset in terms of accuracy and speed.

Ultra-High-Definition Image Deblurring via Multi-scale Cubic-Mixer

TL;DR

The paper tackles UHD image deblurring with real-time performance by introducing a frequency-domain, MLP-based framework called Multi-scale Cubic-Mixer. It processes the real and imaginary parts of FFT coefficients via Wave-Frequency Processing, uses a multi-scale cubic-mixer to model long-range dependencies in the frequency domain, and employs a slicing scheme to reconstruct arbitrary-size UHD images with high fidelity. The approach achieves competitive or superior PSNR/SSIM on 4KRD and other benchmarks while delivering real-time throughput on a single GPU, outperforming several state-of-the-art methods in UHD settings. This method offers practical impact for on-device UHD deblurring in consumer devices and streaming applications, with potential extensions to related restoration tasks and downstream vision systems.

Abstract

Currently, transformer-based algorithms are making a splash in the domain of image deblurring. Their achievement depends on the self-attention mechanism with CNN stem to model long range dependencies between tokens. Unfortunately, this ear-pleasing pipeline introduces high computational complexity and makes it difficult to run an ultra-high-definition image on a single GPU in real time. To trade-off accuracy and efficiency, the input degraded image is computed cyclically over three dimensional (, , and ) signals without a self-attention mechanism. We term this deep network as Multi-scale Cubic-Mixer, which is acted on both the real and imaginary components after fast Fourier transform to estimate the Fourier coefficients and thus obtain a deblurred image. Furthermore, we combine the multi-scale cubic-mixer with a slicing strategy to generate high-quality results at a much lower computational cost. Experimental results demonstrate that the proposed algorithm performs favorably against the state-of-the-art deblurring approaches on the several benchmarks and a new ultra-high-definition dataset in terms of accuracy and speed.
Paper Structure (13 sections, 8 equations, 8 figures, 4 tables)

This paper contains 13 sections, 8 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: (a) shows the architecture of the proposed single image deblurring network, which consists of three parts. The first part starts with three-path low resolution (LR) feature maps prediction stream (multi-scale cubic-mixer) that learns the frequency domain information to predict a basket of full-resolution feature maps. The second part learns a high quality local feature (attention tensor with 6 channels) by using CNNs with PReLU. The last part generates a full-resolution clear image via slicing scheme. Our proposed algorithm supports UHD image deblurring at 25 ms on a single Titan RTX GPU shader. (b) shows the architecture of the Cubic-Mixer. The basic framework of the model is proposed in modified MLP-Mixer, and we expand one dimension.
  • Figure 2: This figure shows that the change in the real/imaginary part of the Fourier coefficients on a pair of clear/blurred images. Note that the Fourier coefficients are normalized (the pixel range is 0$\sim$10) and executed on images sampled at 64 $\times$.
  • Figure 3: This figure shows the spectrum of the output of each yellow block in the top path of the network. With the propagation of data through the network, the output spectrum of the trailing yellow block is closer to the spectrum of the clear image. Note that the output of each yellow block is transformed into a complex tensor before being rendered.
  • Figure 4: This figure shows the characteristics of each feature map in the slicing operation, and it is clear that the details acting on the three groups of feature maps in the color channel are complementary to each other.
  • Figure 5: Qualitative evaluations on G-P (GoPro), R-J (RealBlur-R), R-R (RealBlur-J) and 4K (4KRD) datasets. Our proposed multi-scale cubic-mixer generates much clearer details and sharp edges. (Zoom in for best view)
  • ...and 3 more figures