Table of Contents
Fetching ...

F2former: When Fractional Fourier Meets Deep Wiener Deconvolution and Selective Frequency Transformer for Image Deblurring

Subhajit Paul, Sahil Kumawat, Ashutosh Gupta, Deepak Mishra

TL;DR

Experimental results show that the performance of the proposed Fractional Fourier Transform (FRFT) approach is superior to other state-of-the-art (SOTA) approaches for motion deblurring as well as defocus deblurring.

Abstract

Recent progress in image deblurring techniques focuses mainly on operating in both frequency and spatial domains using the Fourier transform (FT) properties. However, their performance is limited due to the dependency of FT on stationary signals and its lack of capability to extract spatial-frequency properties. In this paper, we propose a novel approach based on the Fractional Fourier Transform (FRFT), a unified spatial-frequency representation leveraging both spatial and frequency components simultaneously, making it ideal for processing non-stationary signals like images. Specifically, we introduce a Fractional Fourier Transformer (F2former), where we combine the classical fractional Fourier based Wiener deconvolution (F2WD) as well as a multi-branch encoder-decoder transformer based on a new fractional frequency aware transformer block (F2TB). We design F2TB consisting of a fractional frequency aware self-attention (F2SA) to estimate element-wise product attention based on important frequency components and a novel feed-forward network based on frequency division multiplexing (FM-FFN) to refine high and low frequency features separately for efficient latent clear image restoration. Experimental results for the cases of both motion deblurring as well as defocus deblurring show that the performance of our proposed method is superior to other state-of-the-art (SOTA) approaches.

F2former: When Fractional Fourier Meets Deep Wiener Deconvolution and Selective Frequency Transformer for Image Deblurring

TL;DR

Experimental results show that the performance of the proposed Fractional Fourier Transform (FRFT) approach is superior to other state-of-the-art (SOTA) approaches for motion deblurring as well as defocus deblurring.

Abstract

Recent progress in image deblurring techniques focuses mainly on operating in both frequency and spatial domains using the Fourier transform (FT) properties. However, their performance is limited due to the dependency of FT on stationary signals and its lack of capability to extract spatial-frequency properties. In this paper, we propose a novel approach based on the Fractional Fourier Transform (FRFT), a unified spatial-frequency representation leveraging both spatial and frequency components simultaneously, making it ideal for processing non-stationary signals like images. Specifically, we introduce a Fractional Fourier Transformer (F2former), where we combine the classical fractional Fourier based Wiener deconvolution (F2WD) as well as a multi-branch encoder-decoder transformer based on a new fractional frequency aware transformer block (F2TB). We design F2TB consisting of a fractional frequency aware self-attention (F2SA) to estimate element-wise product attention based on important frequency components and a novel feed-forward network based on frequency division multiplexing (FM-FFN) to refine high and low frequency features separately for efficient latent clear image restoration. Experimental results for the cases of both motion deblurring as well as defocus deblurring show that the performance of our proposed method is superior to other state-of-the-art (SOTA) approaches.
Paper Structure (27 sections, 1 theorem, 31 equations, 22 figures, 5 tables)

This paper contains 27 sections, 1 theorem, 31 equations, 22 figures, 5 tables.

Key Result

Proposition 1

Consider $\mathbf{K}_p$ be the estimated blur kernel. Then, for the $j$-th feature, the FWD operator in the FRFT domain can be estimated as where, $(\cdot)^H$ is conjugate transpose operation, $\mathbf{S}_j^{xx}$ and $\mathbf{S}_j^{nn}$ is auto-correlation of clean signal and noise content for $j$-th feature, $\mathbf{C}$ is a diagonal matrix corresponds to chirp multiplication with diagonal eleme

Figures (22)

  • Figure 1: Comparison of the proposed model with other SOTA methods on the GoPro test dataset in terms of PSNR, floating point operations (FLOPs), and number of network parameters which are corresponded by the area of the circle.
  • Figure 2: For optimal choice of $\alpha$, FRFT inherently separates spatially varying artefacts from non-stationary signals along the fractional order axis $t_{\alpha}$.
  • Figure 3: (a) Overall architecture of F2former. Given input $\mathbf{X}_p$ for $p$-th scale, KEB estimates the blur kernel $\mathbf{K}_p$. Given $\mathbf{X}_p$, (c) shallow layers extract features and F2WD refines them by performing deblurring using FRFT based feature Wiener deconvolution. Refined features $\mathbf{\hat{X}}_p^e$ pass through FHTB blocks for efficient image reconstruction. An FHTB consists of a FRFT based (d) feature refinement block, and $L_p$ numbers of (b) F2TBs, which consist of two major modules - self-attention estimation in FRFT domain by F2SA and frequency selective feed-forward operation by FM-FFN.
  • Figure 4: Architecture of F2TB layer. Overview of (a) Fractional Fourier aware Self-Attention (F2SA) and (b) Frequency division multiplexing based Feed Forward Network (FM-FFN)
  • Figure 5: Visual illustration on the GoPro test dataset. The first row shows blurred image, predicted images of various methods, and GT sharp image. The second row shows the residual of the respective blurred image and predicted sharp images with GT.
  • ...and 17 more figures

Theorems & Definitions (1)

  • Proposition 1