InvFusion: Bridging Supervised and Zero-shot Diffusion for Inverse Problems
Noam Elata, Hyungjin Chung, Jong Chul Ye, Tomer Michaeli, Michael Elad
TL;DR
InvFusion tackles the trade-off between zero-shot flexibility and training-based accuracy in diffusion-model-based inverse problems by injecting the degradation operator $H$ into the denoiser via a Feature Degradation Layer and joint-attention. This degradation-aware architecture enables a single model to solve multiple degradations with state-of-the-art posterior sampling quality and competitive MMSE performance, demonstrated on FFHQ and ImageNet scales. Beyond sampling, the framework supports MMSE prediction and Neural Posterior Principal Components estimation, offering uncertainty quantification across degradations. The work introduces a paradigm shift toward versatile, degradation-aware diffusion solvers with practical impact for high-fidelity image restoration and related tasks.
Abstract
Diffusion Models have demonstrated remarkable capabilities in handling inverse problems, offering high-quality posterior-sampling-based solutions. Despite significant advances, a fundamental trade-off persists regarding the way the conditioned synthesis is employed: Zero-shot approaches can accommodate any linear degradation but rely on approximations that reduce accuracy. In contrast, training-based methods model the posterior correctly, but cannot adapt to the degradation at test-time. Here we introduce InvFusion, the first training-based degradation-aware posterior sampler. InvFusion combines the best of both worlds -- the strong performance of supervised approaches and the flexibility of zero-shot methods. This is achieved through a novel architectural design that seamlessly integrates the degradation operator directly into the diffusion denoiser. We compare InvFusion against existing general-purpose posterior samplers, both degradation-aware zero-shot techniques and blind training-based methods. Experiments on the FFHQ and ImageNet datasets demonstrate state-of-the-art performance. Beyond posterior sampling, we further demonstrate the applicability of our architecture, operating as a general Minimum Mean Square Error predictor, and as a Neural Posterior Principal Component estimator.
