Table of Contents
Fetching ...

A Modular Conditional Diffusion Framework for Image Reconstruction

Magauiya Zhussip, Iaroslav Koshelev, Stamatis Lefkimmiatis

TL;DR

This paper addresses the practical barriers to deploying diffusion probabilistic models for blind image restoration by introducing DP-IR, a modular framework that attaches a small fusion denoiser to off-the-shelf IR backbones. It leverages conditional score matching to estimate E[$oldsymbol x_0$ | $oldsymbol y$, $oldsymbol x_t$], enabling conditional diffusion sampling with minimal extra training (≈0.7M parameters for the fusion module) and a novel one-step accelerated sampling that drastically reduces NFEs. The method achieves state-of-the-art perceptual quality across burst JDD-SR, dynamic scene deblurring, and 4× SISR while maintaining competitive fidelity, and it can be complemented with existing accelerations like DDIM. Ablations show the approach is plug-and-play across denoisers and IR backbones, offering a practical route to broader adoption of diffusion-based IR under resource constraints.

Abstract

Diffusion Probabilistic Models (DPMs) have been recently utilized to deal with various blind image restoration (IR) tasks, where they have demonstrated outstanding performance in terms of perceptual quality. However, the task-specific nature of existing solutions and the excessive computational costs related to their training, make such models impractical and challenging to use for different IR tasks than those that were initially trained for. This hinders their wider adoption, especially by those who lack access to powerful computational resources and vast amount of training data. In this work we aim to address the above issues and enable the successful adoption of DPMs in practical IR-related applications. Towards this goal, we propose a modular diffusion probabilistic IR framework (DP-IR), which allows us to combine the performance benefits of existing pre-trained state-of-the-art IR networks and generative DPMs, while it requires only the additional training of a relatively small module (0.7M params) related to the particular IR task of interest. Moreover, the architecture of the proposed framework allows for a sampling strategy that leads to at least four times reduction of neural function evaluations without suffering any performance loss, while it can also be combined with existing acceleration techniques such as DDIM. We evaluate our model on four benchmarks for the tasks of burst JDD-SR, dynamic scene deblurring, and super-resolution. Our method outperforms existing approaches in terms of perceptual quality while it retains a competitive performance with respect to fidelity metrics.

A Modular Conditional Diffusion Framework for Image Reconstruction

TL;DR

This paper addresses the practical barriers to deploying diffusion probabilistic models for blind image restoration by introducing DP-IR, a modular framework that attaches a small fusion denoiser to off-the-shelf IR backbones. It leverages conditional score matching to estimate E[ | , ], enabling conditional diffusion sampling with minimal extra training (≈0.7M parameters for the fusion module) and a novel one-step accelerated sampling that drastically reduces NFEs. The method achieves state-of-the-art perceptual quality across burst JDD-SR, dynamic scene deblurring, and 4× SISR while maintaining competitive fidelity, and it can be complemented with existing accelerations like DDIM. Ablations show the approach is plug-and-play across denoisers and IR backbones, offering a practical route to broader adoption of diffusion-based IR under resource constraints.

Abstract

Diffusion Probabilistic Models (DPMs) have been recently utilized to deal with various blind image restoration (IR) tasks, where they have demonstrated outstanding performance in terms of perceptual quality. However, the task-specific nature of existing solutions and the excessive computational costs related to their training, make such models impractical and challenging to use for different IR tasks than those that were initially trained for. This hinders their wider adoption, especially by those who lack access to powerful computational resources and vast amount of training data. In this work we aim to address the above issues and enable the successful adoption of DPMs in practical IR-related applications. Towards this goal, we propose a modular diffusion probabilistic IR framework (DP-IR), which allows us to combine the performance benefits of existing pre-trained state-of-the-art IR networks and generative DPMs, while it requires only the additional training of a relatively small module (0.7M params) related to the particular IR task of interest. Moreover, the architecture of the proposed framework allows for a sampling strategy that leads to at least four times reduction of neural function evaluations without suffering any performance loss, while it can also be combined with existing acceleration techniques such as DDIM. We evaluate our model on four benchmarks for the tasks of burst JDD-SR, dynamic scene deblurring, and super-resolution. Our method outperforms existing approaches in terms of perceptual quality while it retains a competitive performance with respect to fidelity metrics.

Paper Structure

This paper contains 43 sections, 4 theorems, 18 equations, 17 figures, 16 tables.

Key Result

Lemma 3.1

Let $\bm y\!\in\!\mathbb{R}^{M}$, $\bm x_0\!\in\!\mathbb{R}^{N} \sim p\left(\bm x_0|\bm y\right)$, and $\bm x_t\!\in\!\mathbb{R}^{N}$, $\bar{\alpha}_t\!\in\!\mathbb{R}$ are defined as in eq:cumulative_forward. Then, the conditional score function is computed as: ∇_ x_t p( x_t | y) = √α̅_t E[ x_0 |

Figures (17)

  • Figure 1: The proposed architecture consists of three modules: a Denoising Network $\bm \phi^D_{\bm \theta_D}\left(\tilde{\bm x}_t, \tilde{\sigma}_t\right)$, an IR Network $\bm \phi^{IR}_{\bm \theta_{IR}}\left(\bm y\right)$ and a Fusion Network $\bm \phi^F_{\bm \theta_F}\left(\bm x_0^{IR}, \bm x_0^{D}, t\right)$. A small version of MIRNet zamir2020learning is used as the Denoising Network, while a pre-trained SwinIR liang2021swinir or BSRT luo2022bsrt or FFTFormer kong2023efficient is used as the IR Network, depending on the IR task. See \ref{['sec:network_architecture']} for a detailed description.
  • Figure 2: Forward and reverse diffusion process. Blue solid arrows: transitions at the forward pass with sampling distribution from \ref{['eq:diffusion_kernel']}. Dashed arrow: cumulative transition probability from \ref{['eq:cumulative_forward']}. Black solid arrows: transitions at the backward pass with the sampling distribution from \ref{['eq:posterior']}. Red solid arrow: closed-form cumulative transition probability from \ref{['eq:single_step']} representing our accelerated sampling.
  • Figure 3: Visual comparisons on the GoPro test set for the task of dynamic scene deblurring (best viewed by zooming in). Every output image is accompanied by its LPIPS value.
  • Figure 4: Visual comparisons on the DIV2K validation set for the task of $4\times$ bicubic super-resolution (best viewed by zooming in). Every output image is accompanied by its LPIPS value.
  • Figure 5: Detailed visualization of the proposed Fusion module.
  • ...and 12 more figures

Theorems & Definitions (5)

  • Lemma 3.1
  • Lemma 3.2
  • Proposition C.1
  • Lemma C.2
  • proof : Proof