Table of Contents
Fetching ...

Deep Optimal Transport: A Practical Algorithm for Photo-realistic Image Restoration

Theo Adrai, Guy Ohayon, Tomer Michaeli, Michael Elad

TL;DR

This work tackles the perception–distortion trade-off in image restoration by approximating the $D_{max}$ predictor via optimal transport between the MMSE output and natural-image distributions. It introduces a practical, few-shot method that performs OT in the latent space of a pre-trained VAE, leveraging a closed-form transport for latent Multivariate Gaussians to transport restored samples toward higher perceptual quality. The approach can be wrapped around any existing restoration model and, through a test-time interpolation parameter $\alpha$, allows deliberate balancing of perceptual quality and distortion with minimal data and computation. Empirically, the method yields consistent perceptual improvements across diverse tasks (SISR, denoising, JPEG, NSR, CSR) and model families (GAN- and diffusion-based), while remaining lightweight and plug-and-play for real-world deployment.

Abstract

We propose an image restoration algorithm that can control the perceptual quality and/or the mean square error (MSE) of any pre-trained model, trading one over the other at test time. Our algorithm is few-shot: Given about a dozen images restored by the model, it can significantly improve the perceptual quality and/or the MSE of the model for newly restored images without further training. Our approach is motivated by a recent theoretical result that links between the minimum MSE (MMSE) predictor and the predictor that minimizes the MSE under a perfect perceptual quality constraint. Specifically, it has been shown that the latter can be obtained by optimally transporting the output of the former, such that its distribution matches the source data. Thus, to improve the perceptual quality of a predictor that was originally trained to minimize MSE, we approximate the optimal transport by a linear transformation in the latent space of a variational auto-encoder, which we compute in closed-form using empirical means and covariances. Going beyond the theory, we find that applying the same procedure on models that were initially trained to achieve high perceptual quality, typically improves their perceptual quality even further. And by interpolating the results with the original output of the model, we can improve their MSE on the expense of perceptual quality. We illustrate our method on a variety of degradations applied to general content images of arbitrary dimensions.

Deep Optimal Transport: A Practical Algorithm for Photo-realistic Image Restoration

TL;DR

This work tackles the perception–distortion trade-off in image restoration by approximating the predictor via optimal transport between the MMSE output and natural-image distributions. It introduces a practical, few-shot method that performs OT in the latent space of a pre-trained VAE, leveraging a closed-form transport for latent Multivariate Gaussians to transport restored samples toward higher perceptual quality. The approach can be wrapped around any existing restoration model and, through a test-time interpolation parameter , allows deliberate balancing of perceptual quality and distortion with minimal data and computation. Empirically, the method yields consistent perceptual improvements across diverse tasks (SISR, denoising, JPEG, NSR, CSR) and model families (GAN- and diffusion-based), while remaining lightweight and plug-and-play for real-world deployment.

Abstract

We propose an image restoration algorithm that can control the perceptual quality and/or the mean square error (MSE) of any pre-trained model, trading one over the other at test time. Our algorithm is few-shot: Given about a dozen images restored by the model, it can significantly improve the perceptual quality and/or the MSE of the model for newly restored images without further training. Our approach is motivated by a recent theoretical result that links between the minimum MSE (MMSE) predictor and the predictor that minimizes the MSE under a perfect perceptual quality constraint. Specifically, it has been shown that the latter can be obtained by optimally transporting the output of the former, such that its distribution matches the source data. Thus, to improve the perceptual quality of a predictor that was originally trained to minimize MSE, we approximate the optimal transport by a linear transformation in the latent space of a variational auto-encoder, which we compute in closed-form using empirical means and covariances. Going beyond the theory, we find that applying the same procedure on models that were initially trained to achieve high perceptual quality, typically improves their perceptual quality even further. And by interpolating the results with the original output of the model, we can improve their MSE on the expense of perceptual quality. We illustrate our method on a variety of degradations applied to general content images of arbitrary dimensions.
Paper Structure (24 sections, 6 equations, 8 figures, 1 table)

This paper contains 24 sections, 6 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: The $\mathcal{W}_2$-MSE trade-off freirich2021a.
  • Figure 2: Our few-shot algorithm improves the visual quality of any estimator at test time. For example, we can improve the photo-realism of DDRM kawar2022ddrm even further.
  • Figure 3: Trading perception and distortion using out-of-the-box predictors, wrapped with our method. Using \ref{['eq:interp']} with $\alpha\in[0,1]$ we interpolate a given predictor (orange) and our improved $\mathbf{D_{max}}$ estimation (green), to approximate the PD FID-MSE function (blue curve). With $\alpha \in [-1,0]\cup[1,2]$ we extrapolate outside of the PD curve (light gray), beyond the theory-inspired area, to further improve performance.
  • Figure 4: With a pre-trained VAE, we estimate the first and second order statistics of the latent patches of natural images and the restorations of some given estimator. At inference time, we use the closed-form OT \ref{['eq:tmvg']} operator between MVG distributions to transport the latent representation of a given restored sample, which, after decoding, increases the visual quality of the restored sample. For a fully detailed explanation of the algorithm, see \ref{['method']}.
  • Figure 5: Our method (third column from the left) notably improves the results of several benchmark predictors (second column from the left) on various degradations.
  • ...and 3 more figures