Table of Contents
Fetching ...

Trustworthy SR: Resolving Ambiguity in Image Super-resolution via Diffusion Models and Human Feedback

Cansu Korkmaz, Ege Cirakman, A. Murat Tekalp, Zafer Dogan

TL;DR

The paper tackles ambiguity in diffusion-based SR by introducing LDM-SS, a human-in-the-loop sampling and ensembling framework. It leverages a pre-trained Latent Diffusion Model to generate an SR space and uses human feedback to select up to five informative samples, which are then ensembled into a single trustworthy image. Experiments on MNIST and DIV2K reveal improved perceptual trustworthiness and artifact suppression, though standard metrics like PSNR may not capture these gains. The method is general and complementary to other diffusion-based SR approaches, enabling reliable SR for information-critical applications such as digit recognition.

Abstract

Super-resolution (SR) is an ill-posed inverse problem with a large set of feasible solutions that are consistent with a given low-resolution image. Various deterministic algorithms aim to find a single solution that balances fidelity and perceptual quality; however, this trade-off often causes visual artifacts that bring ambiguity in information-centric applications. On the other hand, diffusion models (DMs) excel in generating a diverse set of feasible SR images that span the solution space. The challenge is then how to determine the most likely solution among this set in a trustworthy manner. We observe that quantitative measures, such as PSNR, LPIPS, DISTS, are not reliable indicators to resolve ambiguous cases. To this effect, we propose employing human feedback, where we ask human subjects to select a small number of likely samples and we ensemble the averages of selected samples. This strategy leverages the high-quality image generation capabilities of DMs, while recognizing the importance of obtaining a single trustworthy solution, especially in use cases, such as identification of specific digits or letters, where generating multiple feasible solutions may not lead to a reliable outcome. Experimental results demonstrate that our proposed strategy provides more trustworthy solutions when compared to state-of-the art SR methods.

Trustworthy SR: Resolving Ambiguity in Image Super-resolution via Diffusion Models and Human Feedback

TL;DR

The paper tackles ambiguity in diffusion-based SR by introducing LDM-SS, a human-in-the-loop sampling and ensembling framework. It leverages a pre-trained Latent Diffusion Model to generate an SR space and uses human feedback to select up to five informative samples, which are then ensembled into a single trustworthy image. Experiments on MNIST and DIV2K reveal improved perceptual trustworthiness and artifact suppression, though standard metrics like PSNR may not capture these gains. The method is general and complementary to other diffusion-based SR approaches, enabling reliable SR for information-critical applications such as digit recognition.

Abstract

Super-resolution (SR) is an ill-posed inverse problem with a large set of feasible solutions that are consistent with a given low-resolution image. Various deterministic algorithms aim to find a single solution that balances fidelity and perceptual quality; however, this trade-off often causes visual artifacts that bring ambiguity in information-centric applications. On the other hand, diffusion models (DMs) excel in generating a diverse set of feasible SR images that span the solution space. The challenge is then how to determine the most likely solution among this set in a trustworthy manner. We observe that quantitative measures, such as PSNR, LPIPS, DISTS, are not reliable indicators to resolve ambiguous cases. To this effect, we propose employing human feedback, where we ask human subjects to select a small number of likely samples and we ensemble the averages of selected samples. This strategy leverages the high-quality image generation capabilities of DMs, while recognizing the importance of obtaining a single trustworthy solution, especially in use cases, such as identification of specific digits or letters, where generating multiple feasible solutions may not lead to a reliable outcome. Experimental results demonstrate that our proposed strategy provides more trustworthy solutions when compared to state-of-the art SR methods.
Paper Structure (8 sections, 6 figures, 2 tables)

This paper contains 8 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Visual performance of recent $\times$4 SR methods on a crop from Urban100 dataset (img-6) urban100_cite. SOTA methods reconstruct "5" as "6", whereas the opening in the lower part of "5" is visible in our result confirming that the proposed strategy resolves the ambiguity and provide a thrustworthy solution. Note PSNR, DISTS and other quantitative scores are not reliable indicators to resove such ambiguity.
  • Figure 2: Demonstration of the SR space spanned by LDM LDM_rombach2022high samples, proposed LDM-SS and other state-of-the-art methods on the PSNR-DISTS plane. We note that perception-distortion tradeoff with respect to known metrics does not correlate well with visual quality and trustworthiness.
  • Figure 3: Block diagram of our approach depicting sample selection from the diffusion model SR space by human feedback.
  • Figure 4: Identification of the digit from LR image is impossible and results of state-of-the-art methods HAT hat_chen2023activating (Regressive) and SROOE srooe_Park_2023_CVPR (GAN-SR) are ambiguous. However, the five most selected figures were combined through pixel-wise averaging, yielding single, informative SR image. The prevalence of the perception of "5" enables mitigating ambiguity.
  • Figure 5: Visual comparison of proposed method and state-of-the art regressive (purple), GAN-based (blue), flow-based (red) and LDM SR methods on Mnist dataset deng2012mnist. It can be seen that the proposed LDM-SS method provides more reliable SR images for information retrieval, but quantitative metrics are insufficient to capture the nuances of visual artifacts or trustworthiness.
  • ...and 1 more figures