Table of Contents
Fetching ...

Adaptive High-Pass Kernel Prediction for Efficient Video Deblurring

Bo Ji, Angela Yao

TL;DR

This paper tackles video deblurring by addressing the loss of high-frequency details caused by blur and neural spectral bias. It introduces AHFNet, which explicitly extracts high-frequency information using a dynamic combination of fixed high-pass basis kernels, guided by a coefficient generator, and integrates these HF features into a bidirectional, lightweight deblurring pipeline. Key contributions include a provably HF-preserving kernel combination, the use of rotated basis kernels to capture multi-directional HF content, and demonstrated state-of-the-art performance under low memory budgets with notable inference efficiency. The approach offers practical benefits for hardware-constrained settings, enabling sharper video deblurring without exorbitant memory or compute demands.

Abstract

State-of-the-art video deblurring methods use deep network architectures to recover sharpened video frames. Blurring especially degrades high-frequency (HF) information, yet this aspect is often overlooked by recent models that focus more on enhancing architectural design. Recovering these fine details is challenging, partly due to the spectral bias of neural networks, which are inclined towards learning low-frequency functions. To address this, we enforce explicit network structures to capture the fine details and edges. We dynamically predict adaptive high-pass kernels from a linear combination of high-pass basis kernels to extract high-frequency features. This strategy is highly efficient, resulting in low-memory footprints for training and fast run times for inference, all while achieving state-of-the-art when compared to low-budget models. The code is available at https://github.com/jibo27/AHFNet.

Adaptive High-Pass Kernel Prediction for Efficient Video Deblurring

TL;DR

This paper tackles video deblurring by addressing the loss of high-frequency details caused by blur and neural spectral bias. It introduces AHFNet, which explicitly extracts high-frequency information using a dynamic combination of fixed high-pass basis kernels, guided by a coefficient generator, and integrates these HF features into a bidirectional, lightweight deblurring pipeline. Key contributions include a provably HF-preserving kernel combination, the use of rotated basis kernels to capture multi-directional HF content, and demonstrated state-of-the-art performance under low memory budgets with notable inference efficiency. The approach offers practical benefits for hardware-constrained settings, enabling sharper video deblurring without exorbitant memory or compute demands.

Abstract

State-of-the-art video deblurring methods use deep network architectures to recover sharpened video frames. Blurring especially degrades high-frequency (HF) information, yet this aspect is often overlooked by recent models that focus more on enhancing architectural design. Recovering these fine details is challenging, partly due to the spectral bias of neural networks, which are inclined towards learning low-frequency functions. To address this, we enforce explicit network structures to capture the fine details and edges. We dynamically predict adaptive high-pass kernels from a linear combination of high-pass basis kernels to extract high-frequency features. This strategy is highly efficient, resulting in low-memory footprints for training and fast run times for inference, all while achieving state-of-the-art when compared to low-budget models. The code is available at https://github.com/jibo27/AHFNet.

Paper Structure

This paper contains 14 sections, 1 theorem, 10 equations, 7 figures, 5 tables.

Key Result

Proposition 1

Consider $M$ spatial high-pass filters with impulse responses $h_i(x)$ and corresponding frequency responses $H_i(f)$ with cutoff frequencies $f_{ci}$, sorted such that $f_{c1} \leq f_{c2} \leq \dots \leq f_{cM}$. Any linear combination of these filters with non-negative coefficients $\alpha_i$ in t will also act as a high-pass filter in the spatial domain, with a corresponding frequency response

Figures (7)

  • Figure 1: Adaptive high-frequency extraction module ($\mathcal{H}$). We generate the dynamic high-pass kernel $k_t$ for future convolution by performing a linear combination of high-pass basis kernels $\{\tilde{k}_j\}_{j=1}^M$ and the coefficients $\{\alpha_{t,j}\}_{j=1}^M$.
  • Figure 2: Overview of AHFNet. We extract high frequencies using $\mathcal{H}$, which are explicitly utilized for deblurring.
  • Figure 3: Trade-off between PSNR on GOPRO and the training memory cost. The cost is measured using a $256\times 256$ patch size and a batch size of 8 for processing $N_s$ frames (a) and a single frame (b). The dot size represents $N_s$, which is the number of frames per training sequence used in the official implementation.
  • Figure 4: Inference time for a single $1280\times 720$ frame vs. PSNR on the GOPRO dataset nah2017deep. The dot size represents GMACs. Compared models share a similar training memory footprint. Our method achieves the best PSNR with minimal inference time.
  • Figure 5: Qualitative comparisons to models with a similar training memory footprint.
  • ...and 2 more figures

Theorems & Definitions (1)

  • Proposition 1