Table of Contents
Fetching ...

Inversion by Direct Iteration: An Alternative to Denoising Diffusion for Image Restoration

Mauricio Delbracio, Peyman Milanfar

TL;DR

InDI reframes image restoration as iterative, direct inversion that progressively refines a degraded input without predicting the clean target in one shot, thereby reducing regression-to-the-mean and enhancing perceptual realism. It defines a forward degradation $x_t=(1-t)x+ty$ and learns a single regressor $F_\theta(x_t,t)$ to approximate $\mathbb{E}[x_0|x_t]$, enabling small-step updates that form a residual-flow ODE $\frac{d x_t}{d t}=\frac{x_t-F_\theta(x_t,t)}{t}$ and connect to score-matching diffusion under Gaussian noise. Empirically, InDI delivers superior perceptual quality across motion deblurring, 4× SR, defocus deblurring, and JPEG artifact removal, often matching or surpassing diffusion-based baselines while requiring fewer inference steps; it also explores the impact of $p(t)$ and input noise on restoration, and proposes a unified, supervised framework without explicit degradation models. The work notes limitations, including dependence on paired data and potential instability with too many steps, and suggests future work on robustness to distribution shift and alternative inference strategies.

Abstract

Inversion by Direct Iteration (InDI) is a new formulation for supervised image restoration that avoids the so-called "regression to the mean" effect and produces more realistic and detailed images than existing regression-based methods. It does this by gradually improving image quality in small steps, similar to generative denoising diffusion models. Image restoration is an ill-posed problem where multiple high-quality images are plausible reconstructions of a given low-quality input. Therefore, the outcome of a single step regression model is typically an aggregate of all possible explanations, therefore lacking details and realism. The main advantage of InDI is that it does not try to predict the clean target image in a single step but instead gradually improves the image in small steps, resulting in better perceptual quality. While generative denoising diffusion models also work in small steps, our formulation is distinct in that it does not require knowledge of any analytic form of the degradation process. Instead, we directly learn an iterative restoration process from low-quality and high-quality paired examples. InDI can be applied to virtually any image degradation, given paired training data. In conditional denoising diffusion image restoration the denoising network generates the restored image by repeatedly denoising an initial image of pure noise, conditioned on the degraded input. Contrary to conditional denoising formulations, InDI directly proceeds by iteratively restoring the input low-quality image, producing high-quality results on a variety of image restoration tasks, including motion and out-of-focus deblurring, super-resolution, compression artifact removal, and denoising.

Inversion by Direct Iteration: An Alternative to Denoising Diffusion for Image Restoration

TL;DR

InDI reframes image restoration as iterative, direct inversion that progressively refines a degraded input without predicting the clean target in one shot, thereby reducing regression-to-the-mean and enhancing perceptual realism. It defines a forward degradation and learns a single regressor to approximate , enabling small-step updates that form a residual-flow ODE and connect to score-matching diffusion under Gaussian noise. Empirically, InDI delivers superior perceptual quality across motion deblurring, 4× SR, defocus deblurring, and JPEG artifact removal, often matching or surpassing diffusion-based baselines while requiring fewer inference steps; it also explores the impact of and input noise on restoration, and proposes a unified, supervised framework without explicit degradation models. The work notes limitations, including dependence on paired data and potential instability with too many steps, and suggests future work on robustness to distribution shift and alternative inference strategies.

Abstract

Inversion by Direct Iteration (InDI) is a new formulation for supervised image restoration that avoids the so-called "regression to the mean" effect and produces more realistic and detailed images than existing regression-based methods. It does this by gradually improving image quality in small steps, similar to generative denoising diffusion models. Image restoration is an ill-posed problem where multiple high-quality images are plausible reconstructions of a given low-quality input. Therefore, the outcome of a single step regression model is typically an aggregate of all possible explanations, therefore lacking details and realism. The main advantage of InDI is that it does not try to predict the clean target image in a single step but instead gradually improves the image in small steps, resulting in better perceptual quality. While generative denoising diffusion models also work in small steps, our formulation is distinct in that it does not require knowledge of any analytic form of the degradation process. Instead, we directly learn an iterative restoration process from low-quality and high-quality paired examples. InDI can be applied to virtually any image degradation, given paired training data. In conditional denoising diffusion image restoration the denoising network generates the restored image by repeatedly denoising an initial image of pure noise, conditioned on the degraded input. Contrary to conditional denoising formulations, InDI directly proceeds by iteratively restoring the input low-quality image, producing high-quality results on a variety of image restoration tasks, including motion and out-of-focus deblurring, super-resolution, compression artifact removal, and denoising.
Paper Structure (23 sections, 1 theorem, 24 equations, 23 figures, 3 tables, 1 algorithm)

This paper contains 23 sections, 1 theorem, 24 equations, 23 figures, 3 tables, 1 algorithm.

Key Result

Proposition 4.1

Let ${\bm{x}}_s, {\bm{x}}_t$ be given from equation eq:model, where $s \le t$. Then,

Figures (23)

  • Figure 1: 2D Toy Example. Estimation of conditional mean and iterated estimation for points from a multimodal (4 modes) distribution under: (a) Denoising strong Gaussian noise (${\bm{H}} = {\bm{I}}$); and (b) missing information recovery, i.e., ${\bm{H}}=[1, 0; 0, 0]$, under moderate noise. Blue points represent observed samples, while red ones are the regression prediction. The black (hollow) circles represent the final point in our iterative procedure, always reaching a valid point in the data manifold (orange points). The small green circles indicate the iterative restoration path.
  • Figure 2: Examples of deblurred images from GoPro dataset. Our iterative reconstruction leads achieves better reconstruction of detailed textures than regression based models (Restormer, Maxim) and similar quality than conditional DPMs (DvSR). More results are provided in Appendix.
  • Figure 3: Number of steps. The total number of steps in the iterative regression has a direct impact on the quality. The number of steps seems to control the Perception-distortion tradeoff. One step leads to the best possible MSE reconstruction (minimum distortion) but large perceptual discrepancy.
  • Figure 4: Examples image $4\times$ upscaling. Our iterative reconstruction leads achieves better reconstruction of detailed textures than RRDB regression model wang2018esrgan, less high-frequency artifacts than SRFlow lugmayr2020srflow generative normalizing flow, and comparable visual quality than LDL liang2022details a state-of-the-art customized generative adversarial model. More results are given in Appendix.
  • Figure 5: $4\times$ Super-resolution on div2k dataset div2k. Best values and second-best values for each metric are color-coded
  • ...and 18 more figures

Theorems & Definitions (3)

  • Proposition 4.1
  • Remark 4.2
  • proof