Table of Contents
Fetching ...

KBNet: Kernel Basis Network for Image Restoration

Yi Zhang, Dasong Li, Xiaoyu Shi, Dailan He, Kangning Song, Xiaogang Wang, Hongwei Qin, Hongsheng Li

TL;DR

KBNet tackles adaptive spatial information aggregation for image restoration by introducing Kernel Basis Attention (KBA), which uses learnable kernel bases and per-pixel fusion to capture diverse local patterns. It couples KBA with a Multi-axis Feature Fusion (MFF) block to jointly encode channel-wise, spatial-invariant, and pixel-adaptive features, all integrated into a U-Net backbone. The approach delivers state-of-the-art results across denoising, deraining, and deblurring benchmarks while reducing computational cost relative to prior SOTA methods. Together, these components provide an efficient framework that blends convolutional inductive biases with adaptive spatial processing for robust low-level vision tasks.

Abstract

How to aggregate spatial information plays an essential role in learning-based image restoration. Most existing CNN-based networks adopt static convolutional kernels to encode spatial information, which cannot aggregate spatial information adaptively. Recent transformer-based architectures achieve adaptive spatial aggregation. But they lack desirable inductive biases of convolutions and require heavy computational costs. In this paper, we propose a kernel basis attention (KBA) module, which introduces learnable kernel bases to model representative image patterns for spatial information aggregation. Different kernel bases are trained to model different local structures. At each spatial location, they are linearly and adaptively fused by predicted pixel-wise coefficients to obtain aggregation weights. Based on the KBA module, we further design a multi-axis feature fusion (MFF) block to encode and fuse channel-wise, spatial-invariant, and pixel-adaptive features for image restoration. Our model, named kernel basis network (KBNet), achieves state-of-the-art performances on more than ten benchmarks over image denoising, deraining, and deblurring tasks while requiring less computational cost than previous SOTA methods.

KBNet: Kernel Basis Network for Image Restoration

TL;DR

KBNet tackles adaptive spatial information aggregation for image restoration by introducing Kernel Basis Attention (KBA), which uses learnable kernel bases and per-pixel fusion to capture diverse local patterns. It couples KBA with a Multi-axis Feature Fusion (MFF) block to jointly encode channel-wise, spatial-invariant, and pixel-adaptive features, all integrated into a U-Net backbone. The approach delivers state-of-the-art results across denoising, deraining, and deblurring benchmarks while reducing computational cost relative to prior SOTA methods. Together, these components provide an efficient framework that blends convolutional inductive biases with adaptive spatial processing for robust low-level vision tasks.

Abstract

How to aggregate spatial information plays an essential role in learning-based image restoration. Most existing CNN-based networks adopt static convolutional kernels to encode spatial information, which cannot aggregate spatial information adaptively. Recent transformer-based architectures achieve adaptive spatial aggregation. But they lack desirable inductive biases of convolutions and require heavy computational costs. In this paper, we propose a kernel basis attention (KBA) module, which introduces learnable kernel bases to model representative image patterns for spatial information aggregation. Different kernel bases are trained to model different local structures. At each spatial location, they are linearly and adaptively fused by predicted pixel-wise coefficients to obtain aggregation weights. Based on the KBA module, we further design a multi-axis feature fusion (MFF) block to encode and fuse channel-wise, spatial-invariant, and pixel-adaptive features for image restoration. Our model, named kernel basis network (KBNet), achieves state-of-the-art performances on more than ten benchmarks over image denoising, deraining, and deblurring tasks while requiring less computational cost than previous SOTA methods.
Paper Structure (16 sections, 2 equations, 10 figures, 9 tables)

This paper contains 16 sections, 2 equations, 10 figures, 9 tables.

Figures (10)

  • Figure 1: An overview of kernel basis attention (KBA) Module. With the input feature map $X$, the KBA module first predicts the fusion coefficient map $F$ to linearly fuse the learnable kernel bases $W$ for each location. Then, the fused kernel weights $M$ adaptively encode the local neighborhood of the enhanced feature map $X_e$ to produce the output feature map $X'$.
  • Figure 2: An overview of Multi-axis Feature Fusion (MFF) Block. Channel attention, depthwise convolution, and our KBA module process the input features parallelly. The outputs of three operations are fused by point-wise multiplication.
  • Figure 3: PSNR v.s MACs of different methods on Gaussian denoising of color images. PSNRs are tested on Urban dataset with noise level $\sigma=50$.
  • Figure 4: Visualization results on Gaussian denoising of color images on Urban100 dataset huang2015single_urban100. KBNet can recover more fine textures
  • Figure 5: Visualization of denoising results on SenseNoise dataset zhang2021IDR. Our method produces clearer edges and more faithful colors.
  • ...and 5 more figures