Table of Contents
Fetching ...

DeepRFTv2: Kernel-level Learning for Image Deblurring

Xintian Mao, Haofei Song, Yin-Nian Liu, Qingli Li, Yan Wang

TL;DR

DeepRFTv2 addresses the kernel-level nature of blur by introducing Fourier Kernel Estimator (FKE) and coupling it with a Decoupled Multi-Scale UNet (DMS-UNet) to perform end-to-end kernel-level learning for deblurring. By operating in Fourier space, FKE converts convolution into a multiplicative operation on frequency features, enabling global kernel estimation without supervision and direct convolution with learned features. The DMS-UNet design incorporates reversible sub-units to enable efficient multi-scale processing and mitigate information aliasing, delivering strong empirical performance across motion and defocus blur benchmarks. The work demonstrates that kernel-level learning yields physically meaningful kernels and superior restoration quality, with potential applicability to other kernel-related image restoration tasks.

Abstract

It is well-known that if a network aims to learn how to deblur, it should understand the blur process. Blurring is naturally caused by the convolution of the sharp image with the blur kernel. Thus, allowing the network to learn the blur process in the kernel-level can significantly improve the image deblurring performance. But, current deep networks are still at the pixel-level learning stage, either performing end-to-end pixel-level restoration or stage-wise pseudo kernel-level restoration, failing to enable the deblur model to understand the essence of the blur. To this end, we propose Fourier Kernel Estimator (FKE), which considers the activation operation in Fourier space and converts the convolution problem in the spatial domain to a multiplication problem in Fourier space. Our FKE, jointly optimized with the deblur model, enables the network to learn the kernel-level blur process with low complexity and without any additional supervision. Furthermore, we change the convolution object of the kernel from ``image" to network extracted ``feature", whose rich semantic and structural information is more suitable to blur process learning. With the convolution of the feature and the estimated kernel, our model can learn the essence of blur in kernel-level. To further improve the efficiency of feature extraction, we design a decoupled multi-scale architecture with multiple hierarchical sub-unets with a reversible strategy, which allows better multi-scale encoding and decoding in low training memory. Extensive experiments indicate that our method achieves state-of-the-art motion deblurring results and show potential for handling other kernel-related problems. Analysis also shows our kernel estimator is able to learn physically meaningful kernels. The code will be available at https://github.com/DeepMed-Lab-ECNU/Single-Image-Deblur.

DeepRFTv2: Kernel-level Learning for Image Deblurring

TL;DR

DeepRFTv2 addresses the kernel-level nature of blur by introducing Fourier Kernel Estimator (FKE) and coupling it with a Decoupled Multi-Scale UNet (DMS-UNet) to perform end-to-end kernel-level learning for deblurring. By operating in Fourier space, FKE converts convolution into a multiplicative operation on frequency features, enabling global kernel estimation without supervision and direct convolution with learned features. The DMS-UNet design incorporates reversible sub-units to enable efficient multi-scale processing and mitigate information aliasing, delivering strong empirical performance across motion and defocus blur benchmarks. The work demonstrates that kernel-level learning yields physically meaningful kernels and superior restoration quality, with potential applicability to other kernel-related image restoration tasks.

Abstract

It is well-known that if a network aims to learn how to deblur, it should understand the blur process. Blurring is naturally caused by the convolution of the sharp image with the blur kernel. Thus, allowing the network to learn the blur process in the kernel-level can significantly improve the image deblurring performance. But, current deep networks are still at the pixel-level learning stage, either performing end-to-end pixel-level restoration or stage-wise pseudo kernel-level restoration, failing to enable the deblur model to understand the essence of the blur. To this end, we propose Fourier Kernel Estimator (FKE), which considers the activation operation in Fourier space and converts the convolution problem in the spatial domain to a multiplication problem in Fourier space. Our FKE, jointly optimized with the deblur model, enables the network to learn the kernel-level blur process with low complexity and without any additional supervision. Furthermore, we change the convolution object of the kernel from ``image" to network extracted ``feature", whose rich semantic and structural information is more suitable to blur process learning. With the convolution of the feature and the estimated kernel, our model can learn the essence of blur in kernel-level. To further improve the efficiency of feature extraction, we design a decoupled multi-scale architecture with multiple hierarchical sub-unets with a reversible strategy, which allows better multi-scale encoding and decoding in low training memory. Extensive experiments indicate that our method achieves state-of-the-art motion deblurring results and show potential for handling other kernel-related problems. Analysis also shows our kernel estimator is able to learn physically meaningful kernels. The code will be available at https://github.com/DeepMed-Lab-ECNU/Single-Image-Deblur.

Paper Structure

This paper contains 20 sections, 13 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: Three deblurring model architectures. (a) One-stage deblurring models Nah2017deepCho2021rethinkingZamir2021restormerChen2022simple directly predict blur patterns using neural networks in the spatial domain under an end-to-end training scheme. (b) Kernel prior-assisted models fang2023UFPNetFang_PGDN employ a two-stage strategy: first pre-training a reblur model to obtain kernel priors, then incorporating these priors as spatial features into the deblurring network. (c) Our method integrates the deblurring model and kernel estimator into a unified end-to-end framework, where the estimated kernels are physically applied in the convolution process.
  • Figure 2: Visualizations of various blur images $\mathrm{B}$ and ${\sigma}(\mathcal{F}^{-1}(\mathbf{ReLU}(\mathcal{F}(\mathrm{B}))) - \mathrm{B} / 2)$. $\sigma$ denotes a cyclic shift [H/2, W/2].
  • Figure 3: (a) Coarse-to-Fine architecture Nah2017deepTao2018scale; (b) UNet architecture Wang2022uformerZamir2021restormerChen2022simple; (c) MIMO-UNet architecture Cho2021rethinkingXintianMao2023DeepRFT; (d) Our DeepRFTv2: DMS-UNet with FKE. $\mathrm {B}_n$ and $\mathrm {\hat{S}_n}$ represent the blur and restored image at the $n$-th scale.
  • Figure 4: (a) Reversible-UNet for image restoration with multiple reversible sub-encoders and sub-decoders. Each sub-encoder / decoder is composed of 3 Level module with NAFBlock Chen2022simple and NAFEVSBlock. (b) Fourier Kernel Estimation with multiple reversible sub-resnets. Each sub-resnet is composed of 4 residual modules with VSBlock. (c) VSBlock: Visual Scan Block from EVSSM kong2025EVSSM. (d) NAFEVSBlock: a combination block of NAFBlock Chen2022simple and VSBlock kong2025EVSSM like ALGBlock Gao2023ALGNet.
  • Figure 5: Visual comparison of single image motion deblur approaches on GoPro Nah2017deep and RealBlur-J Rim2020real.
  • ...and 3 more figures