PG-ControlNet: A Physics-Guided ControlNet for Generative Spatially Varying Image Deblurring
Hakki Motorcu, Mujdat Cetin
TL;DR
Addressing the ill-posed problem of spatially varying blur, the paper introduces PG-ControlNet, a physics-guided conditional diffusion framework that represents the blur field as a dense, region-adaptive set of local kernels. Local kernels are compressed via PCA into a 128-dimensional descriptor field aligned to the image grid, which conditions a ControlNet-based diffusion model built on a frozen Stable Diffusion backbone. Only the hint encoder is trained, enabling posterior sampling that enforces data fidelity while preserving perceptual realism; experiments on 512x512 COCO-2017 data show superior perceptual metrics (LPIPS, FID, FSIM) with competitive fidelity, outperforming both model-based and diffusion baselines under challenging nonuniform blur. This approach demonstrates a practical route to combine physical measurements with generative priors, with broad implications for microscopy, aerial imaging, and depth-aware photography.
Abstract
Spatially varying image deblurring remains a fundamentally ill-posed problem, especially when degradations arise from complex mixtures of motion and other forms of blur under significant noise. State-of-the-art learning-based approaches generally fall into two paradigms: model-based deep unrolling methods that enforce physical constraints by modeling the degradations, but often produce over-smoothed, artifact-laden textures, and generative models that achieve superior perceptual quality yet hallucinate details due to weak physical constraints. In this paper, we propose a novel framework that uniquely reconciles these paradigms by taming a powerful generative prior with explicit, dense physical constraints. Rather than oversimplifying the degradation field, we model it as a dense continuum of high-dimensional compressed kernels, ensuring that minute variations in motion and other degradation patterns are captured. We leverage this rich descriptor field to condition a ControlNet architecture, strongly guiding the diffusion sampling process. Extensive experiments demonstrate that our method effectively bridges the gap between physical accuracy and perceptual realism, outperforming state-of-the-art model-based methods as well as generative baselines in challenging, severely blurred scenarios.
