Table of Contents
Fetching ...

KernelFusion: Assumption-Free Blind Super-Resolution via Patch Diffusion

Oliver Heinimann, Assaf Shocher, Tal Zimbalist, Michal Irani

TL;DR

KernelFusion tackles blind super-resolution under unknown, complex downscaling kernels by a zero-shot diffusion-based approach that learns an image-specific patch distribution from a single LR image and jointly reconstructs the HR image while estimating the SR-kernel. The method operates in two phases: Phase 1 trains a patch-diffusion model on LR data, and Phase 2 performs reverse diffusion at high resolution with a consistency loss, using an implicit neural representation to model the SR-kernel. This yields state-of-the-art results on challenging degradations and demonstrates robust kernel recovery for non-Gaussian kernels, marking a shift toward an assumption-free Blind-SR paradigm. The approach enables accurate SR without external priors or pre-trained models, albeit with per-image training time and some limitations with severe LR artifacts, paving the way for hybrid methods that integrate external information.

Abstract

Traditional super-resolution (SR) methods assume an ``ideal'' downscaling SR-kernel (e.g., bicubic downscaling) between the high-resolution (HR) image and the low-resolution (LR) image. Such methods fail once the LR images are generated differently. Current blind-SR methods aim to remove this assumption, but are still fundamentally restricted to rather simplistic downscaling SR-kernels (e.g., anisotropic Gaussian kernels), and fail on more complex (out of distribution) downscaling degradations. However, using the correct SR-kernel is often more important than using a sophisticated SR algorithm. In ``KernelFusion'' we introduce a zero-shot diffusion-based method that makes no assumptions about the kernel. Our method recovers the unique image-specific SR-kernel directly from the LR input image, while simultaneously recovering its corresponding HR image. KernelFusion exploits the principle that the correct SR-kernel is the one that maximizes patch similarity across different scales of the LR image. We first train an image-specific patch-based diffusion model on the single LR input image, capturing its unique internal patch statistics. We then reconstruct a larger HR image with the same learned patch distribution, while simultaneously recovering the correct downscaling SR-kernel that maintains this cross-scale relation between the HR and LR images. Empirical results show that KernelFusion vastly outperforms all SR baselines on complex downscaling degradations, where existing SotA Blind-SR methods fail miserably. By breaking free from predefined kernel assumptions, KernelFusion pushes Blind-SR into a new assumption-free paradigm, handling downscaling kernels previously thought impossible.

KernelFusion: Assumption-Free Blind Super-Resolution via Patch Diffusion

TL;DR

KernelFusion tackles blind super-resolution under unknown, complex downscaling kernels by a zero-shot diffusion-based approach that learns an image-specific patch distribution from a single LR image and jointly reconstructs the HR image while estimating the SR-kernel. The method operates in two phases: Phase 1 trains a patch-diffusion model on LR data, and Phase 2 performs reverse diffusion at high resolution with a consistency loss, using an implicit neural representation to model the SR-kernel. This yields state-of-the-art results on challenging degradations and demonstrates robust kernel recovery for non-Gaussian kernels, marking a shift toward an assumption-free Blind-SR paradigm. The approach enables accurate SR without external priors or pre-trained models, albeit with per-image training time and some limitations with severe LR artifacts, paving the way for hybrid methods that integrate external information.

Abstract

Traditional super-resolution (SR) methods assume an ``ideal'' downscaling SR-kernel (e.g., bicubic downscaling) between the high-resolution (HR) image and the low-resolution (LR) image. Such methods fail once the LR images are generated differently. Current blind-SR methods aim to remove this assumption, but are still fundamentally restricted to rather simplistic downscaling SR-kernels (e.g., anisotropic Gaussian kernels), and fail on more complex (out of distribution) downscaling degradations. However, using the correct SR-kernel is often more important than using a sophisticated SR algorithm. In ``KernelFusion'' we introduce a zero-shot diffusion-based method that makes no assumptions about the kernel. Our method recovers the unique image-specific SR-kernel directly from the LR input image, while simultaneously recovering its corresponding HR image. KernelFusion exploits the principle that the correct SR-kernel is the one that maximizes patch similarity across different scales of the LR image. We first train an image-specific patch-based diffusion model on the single LR input image, capturing its unique internal patch statistics. We then reconstruct a larger HR image with the same learned patch distribution, while simultaneously recovering the correct downscaling SR-kernel that maintains this cross-scale relation between the HR and LR images. Empirical results show that KernelFusion vastly outperforms all SR baselines on complex downscaling degradations, where existing SotA Blind-SR methods fail miserably. By breaking free from predefined kernel assumptions, KernelFusion pushes Blind-SR into a new assumption-free paradigm, handling downscaling kernels previously thought impossible.

Paper Structure

This paper contains 25 sections, 2 equations, 7 figures, 2 tables, 1 algorithm.

Figures (7)

  • Figure 1: The importance of an accurate SR-Kernel. (A) SotA SR-methods fail on complex downscaling kernels outside their training distribution, performing even worse than interpolation on such kernels. (B) Existing SR-kernel estimation methods cannot handle complex downscaling kernels. KernelFusion is the only method capable of estimating arbitrarily challenging SR-kernels.
  • Figure 2: Method Overview. Our approach consists of 2 stages: Phase 1: We train a diffusion model (PD) to learn the patch distribution of a single image. Phase 2: We perform blind SR and kernel estimation simultaneously. In particular, we use the trained PD to shift the HR guess toward the patch distribution of the LR input. A refinement U-Net and an implicit kernel representation model are trained jointly under a consistency loss, ensuring that convolving the estimated HR image with the learned kernel reproduces the original LR image.
  • Figure 3: Blind-SR comparison on the DIV2KFK dataset (4× upscaling). Each row corresponds to a different degraded image from DIV2KFK, while each column shows the output of a different method at a 4× upscaling factor. Notably, our method reduces doubling artifacts in structured patterns (e.g., aerial road scene, 4th row), demonstrating its effectiveness in restoring fine details and mitigating motion effects.
  • Figure 4: Comparison of estimated kernels from different Blind-SR methods. The top row represents the ground-truth (GT) degradation kernels, while each subsequent row corresponds to the estimated kernels from different SR methods, including our approach, KernelGAN, IKR, MLMC, and DKP. Our method demonstrates superior flexibility in recovering complex, non-Gaussian degradations, accurately capturing kernel structures across a diverse range of degradations.
  • Figure 5: Kernel estimation results on Blind144. The top row displays the 12 ground-truth (GT) degradation kernels, including real-world motion blur kernels from levin2009, an anisotropic Gaussian kernel, and three synthetic non-natural kernels (L-shape, empty square, and filled square). The subsequent rows show our method’s estimated kernels for each of the 12 kernels applied to the first 12 images of the DIV2K validation set. Our approach successfully captures a diverse range of degradations, including complex structured kernels, demonstrating its robustness and adaptability in blind SR kernel estimation.
  • ...and 2 more figures