Table of Contents
Fetching ...

The Perception-Robustness Tradeoff in Deterministic Image Restoration

Guy Ohayon, Tomer Michaeli, Michael Elad

TL;DR

This work proves a fundamental limitation: for non-invertible degradations, any deterministic image-restoration estimator achieving high joint perceptual quality must have a large Lipschitz constant, making it vulnerable to adversarial perturbations. The authors formalize a bound Lip$\left(\hat{X}\right) \ge \frac{m_1}{\sqrt{W_p(p_{X,Y},p_{hat{X},Y})}}-m_2$, connect it to the Wasserstein-based joint perceptual index, and validate the tradeoff through toy and real single-image super-resolution experiments across multiple degradations. They show that smaller joint perceptual distance (better perceptual quality and consistency) correlates with increased instability, but also demonstrate how this instability can be exploited to imitate stochastic posterior sampling via input perturbations (FPS-style exploration). The results highlight a practical tension between perceptual fidelity and robustness, with implications for attack surfaces and uncertainty quantification in restoration systems, and point to a path for posterior sampling using deterministic models. The work thus provides both a cautionary perspective on deterministic restorers and a tool for posterior-like exploration in imaging pipelines.

Abstract

We study the behavior of deterministic methods for solving inverse problems in imaging. These methods are commonly designed to achieve two goals: (1) attaining high perceptual quality, and (2) generating reconstructions that are consistent with the measurements. We provide a rigorous proof that the better a predictor satisfies these two requirements, the larger its Lipschitz constant must be, regardless of the nature of the degradation involved. In particular, to approach perfect perceptual quality and perfect consistency, the Lipschitz constant of the model must grow to infinity. This implies that such methods are necessarily more susceptible to adversarial attacks. We demonstrate our theory on single image super-resolution algorithms, addressing both noisy and noiseless settings. We also show how this undesired behavior can be leveraged to explore the posterior distribution, thereby allowing the deterministic model to imitate stochastic methods.

The Perception-Robustness Tradeoff in Deterministic Image Restoration

TL;DR

This work proves a fundamental limitation: for non-invertible degradations, any deterministic image-restoration estimator achieving high joint perceptual quality must have a large Lipschitz constant, making it vulnerable to adversarial perturbations. The authors formalize a bound Lip, connect it to the Wasserstein-based joint perceptual index, and validate the tradeoff through toy and real single-image super-resolution experiments across multiple degradations. They show that smaller joint perceptual distance (better perceptual quality and consistency) correlates with increased instability, but also demonstrate how this instability can be exploited to imitate stochastic posterior sampling via input perturbations (FPS-style exploration). The results highlight a practical tension between perceptual fidelity and robustness, with implications for attack surfaces and uncertainty quantification in restoration systems, and point to a path for posterior sampling using deterministic models. The work thus provides both a cautionary perspective on deterministic restorers and a tool for posterior-like exploration in imaging pipelines.

Abstract

We study the behavior of deterministic methods for solving inverse problems in imaging. These methods are commonly designed to achieve two goals: (1) attaining high perceptual quality, and (2) generating reconstructions that are consistent with the measurements. We provide a rigorous proof that the better a predictor satisfies these two requirements, the larger its Lipschitz constant must be, regardless of the nature of the degradation involved. In particular, to approach perfect perceptual quality and perfect consistency, the Lipschitz constant of the model must grow to infinity. This implies that such methods are necessarily more susceptible to adversarial attacks. We demonstrate our theory on single image super-resolution algorithms, addressing both noisy and noiseless settings. We also show how this undesired behavior can be leveraged to explore the posterior distribution, thereby allowing the deterministic model to imitate stochastic methods.
Paper Structure (39 sections, 5 theorems, 64 equations, 23 figures, 1 table, 1 algorithm)

This paper contains 39 sections, 5 theorems, 64 equations, 23 figures, 1 table, 1 algorithm.

Key Result

Theorem 4.1

Consider any joint probability density function $p_{X,Y}$ of the random variables $X$ and $Y$, such that the degradation is not invertible (according to definition:non-invertible-degradation). For every $\gamma>0$, there exist constants $m_{1},m_{2}>0$ such that for any deterministic estimator $\hat{X}=f(Y)$ of $X$ from $Y$ with joint perceptual index ${W_{p}(p_{X,Y},p_{\hat{X},Y})\leq\gamma}$.

Figures (23)

  • Figure 1: Qualitative illustration of \ref{['theorem:erratic_behavior']}. In any restoration task with a non-invertible degradation (\ref{['definition:non-invertible-degradation']}), the Lipschitz constant $K$ of a deterministic estimator $\hat{X}=f(Y)$ is lower bounded by a function that grows to infinity as the Wasserstein distance between $p_{\hat{X},Y}$ and $p_{X,Y}$ decreases to zero.
  • Figure 2: An illustration of \ref{['theorem:erratic_behavior']} on the toy example from \ref{['section:toy_example']}. On the left, we plot the Lipschitz constant lower bound $\overline{K}$ versus the JEMD (joint perceptual index) of $\hat{X}_{\lambda}=f_{\lambda}(Y)$, for several values of $\lambda$ (the coefficient of the robustness loss). On the right, we present contour plots of the density $p_{X,Y}$ (blue concentric ellipses) and outputs from $\hat{X}_{\lambda}$ as a function of $Y$, for several values of $\lambda$. We clearly see that $\hat{X}_{\lambda}$ is more erratic for smaller values of $\lambda$, as anticipated by \ref{['theorem:erratic_behavior']}. Refer to \ref{['section:toy_example']} for more details.
  • Figure 3: Quantitative demonstration of \ref{['theorem:erratic_behavior']}. We plot $\overline{K}$ versus $\sqrt{\text{JFID}}$ of several image super-resolution algorithms evaluated on several degradations. (a) Results on the Track2 challenge degradation by lugmayr2019aim. (b) Results on the Track1 challenge degradation by Lugmayr_2020_CVPR_Workshops. (c) Results on the standard bicubic $\times 4$ down-sampling degradation. As anticipated by \ref{['theorem:erratic_behavior']}, we see a tradeoff between $\overline{K}$ and $\sqrt{\text{JFID}}$ for all three degradations, i.e., the Lipschitz constant is lower bounded by a function that increases as the joint perceptual index decreases (See \ref{['section:real-world-experiments']}).
  • Figure 4: Visual comparison of some of the super-resolution algorithms evaluated in \ref{['section:real-world-experiments']} on the Track2 challenge degradation by lugmayr2019aim, sorted from top to bottom by their JFID (increasing). The original and the attacked outputs are denoted by $f(y)$ and $f(y_{adv})$, respectively. The PSNR between $y$ and $y_{adv}$ is at least $48.13\text{dB}$ (obtained with $\alpha=1/255$ in I-FGSM), so the difference is visually negligible (see the attacked input $y_{adv}$ of SwinIR-GAN at the top). The PSNR between $f(y)$ and $f(y_{adv})$ is reported next to the name of each algorithm. As can be seen, the better the joint perceptual quality, the higher the sensitivity to adversarial attacks.
  • Figure 5: Adversarial attacks on low-resolution face images intended to alter the outputs of GFPGAN and RRDB to produce a face which is classified as "female" rather than "male". From left to right: Original input $y$, attacked input $y_{adv}$, original output $f(y)$, output obtained from the attacked input $f(y_{adv})$. The perturbed inputs $y_{adv}$ are obtained with $\alpha=16/255$ in I-FGSM. With such a value of $\alpha$ we barely see any visual difference between $y$ and $y_{adv}$. As anticipated, $y_{adv}$ indeed leads to outputs with newly generated features when using GFPGAN (e.g., makeup), yet we barely see any significant change for RRDB. An image gender classifier associates these features with the "female" category, as the predicted class in the GFPGAN outputs switch from "male" when the input is $y$, to "female" when the input is $y_{adv}$. Refer to \ref{['section:adv_attacks']} for more details.
  • ...and 18 more figures

Theorems & Definitions (12)

  • Definition 3.1
  • Theorem 4.1
  • proof
  • Definition 3.1
  • Lemma 3.2
  • proof
  • Lemma 3.3
  • proof
  • Lemma 3.4
  • proof
  • ...and 2 more