Table of Contents
Fetching ...

InvFusion: Bridging Supervised and Zero-shot Diffusion for Inverse Problems

Noam Elata, Hyungjin Chung, Jong Chul Ye, Tomer Michaeli, Michael Elad

TL;DR

InvFusion tackles the trade-off between zero-shot flexibility and training-based accuracy in diffusion-model-based inverse problems by injecting the degradation operator $H$ into the denoiser via a Feature Degradation Layer and joint-attention. This degradation-aware architecture enables a single model to solve multiple degradations with state-of-the-art posterior sampling quality and competitive MMSE performance, demonstrated on FFHQ and ImageNet scales. Beyond sampling, the framework supports MMSE prediction and Neural Posterior Principal Components estimation, offering uncertainty quantification across degradations. The work introduces a paradigm shift toward versatile, degradation-aware diffusion solvers with practical impact for high-fidelity image restoration and related tasks.

Abstract

Diffusion Models have demonstrated remarkable capabilities in handling inverse problems, offering high-quality posterior-sampling-based solutions. Despite significant advances, a fundamental trade-off persists regarding the way the conditioned synthesis is employed: Zero-shot approaches can accommodate any linear degradation but rely on approximations that reduce accuracy. In contrast, training-based methods model the posterior correctly, but cannot adapt to the degradation at test-time. Here we introduce InvFusion, the first training-based degradation-aware posterior sampler. InvFusion combines the best of both worlds -- the strong performance of supervised approaches and the flexibility of zero-shot methods. This is achieved through a novel architectural design that seamlessly integrates the degradation operator directly into the diffusion denoiser. We compare InvFusion against existing general-purpose posterior samplers, both degradation-aware zero-shot techniques and blind training-based methods. Experiments on the FFHQ and ImageNet datasets demonstrate state-of-the-art performance. Beyond posterior sampling, we further demonstrate the applicability of our architecture, operating as a general Minimum Mean Square Error predictor, and as a Neural Posterior Principal Component estimator.

InvFusion: Bridging Supervised and Zero-shot Diffusion for Inverse Problems

TL;DR

InvFusion tackles the trade-off between zero-shot flexibility and training-based accuracy in diffusion-model-based inverse problems by injecting the degradation operator into the denoiser via a Feature Degradation Layer and joint-attention. This degradation-aware architecture enables a single model to solve multiple degradations with state-of-the-art posterior sampling quality and competitive MMSE performance, demonstrated on FFHQ and ImageNet scales. Beyond sampling, the framework supports MMSE prediction and Neural Posterior Principal Components estimation, offering uncertainty quantification across degradations. The work introduces a paradigm shift toward versatile, degradation-aware diffusion solvers with practical impact for high-fidelity image restoration and related tasks.

Abstract

Diffusion Models have demonstrated remarkable capabilities in handling inverse problems, offering high-quality posterior-sampling-based solutions. Despite significant advances, a fundamental trade-off persists regarding the way the conditioned synthesis is employed: Zero-shot approaches can accommodate any linear degradation but rely on approximations that reduce accuracy. In contrast, training-based methods model the posterior correctly, but cannot adapt to the degradation at test-time. Here we introduce InvFusion, the first training-based degradation-aware posterior sampler. InvFusion combines the best of both worlds -- the strong performance of supervised approaches and the flexibility of zero-shot methods. This is achieved through a novel architectural design that seamlessly integrates the degradation operator directly into the diffusion denoiser. We compare InvFusion against existing general-purpose posterior samplers, both degradation-aware zero-shot techniques and blind training-based methods. Experiments on the FFHQ and ImageNet datasets demonstrate state-of-the-art performance. Beyond posterior sampling, we further demonstrate the applicability of our architecture, operating as a general Minimum Mean Square Error predictor, and as a Neural Posterior Principal Component estimator.

Paper Structure

This paper contains 35 sections, 8 equations, 7 figures, 15 tables.

Figures (7)

  • Figure 1: Examples for posterior samples from our degradation-aware diffusion model. A single model can restore multiple degradations, such as in-painting, de-blurring and super-resolution with high image fidelity and realism, by integrating the degradation operator into the model's architecture.
  • Figure 2: A diagram of the InvFusion Block. Our block contains a Feature Degradation Layer, which incorporates the operator ${\bm{H}}$ into the architecture by applying ${\bm{H}}$ on the activations and comparing them with the measurements ${\mathbf{y}} = {\bm{H}} {\mathbf{x}}$.
  • Figure 3: Examples comparing zero-shot and training-based inverse problem solvers. For each degradation, the top row is an example from FFHQ64 and the bottom row from ImageNet64. Images generated using training-based methods use deterministic samplers and identical seeds, highlighting the subtle effects the different training algorithms.
  • Figure 3: Comparison of restoration on a degradation that did not appear in training. InvFusion demonstrates strong adaptation capabilities through its degradation-aware architecture.
  • Figure 4: Using InvFusion NPPC. By being degradation-aware, the model can be trained to predict the MMSE along with several leading principal components ${\mathbf{w}}_i$ (left, contrast enhanced) for many degradations.
  • ...and 2 more figures