Table of Contents
Fetching ...

Evaluating and Preserving High-level Fidelity in Super-Resolution

Josep M. Rocafort, Shaolin Su, Alexandra Gomez-Villa, Javier Vazquez-Corral

TL;DR

The paper argues that evaluating SR requires assessing high-level fidelity—the preservation of semantics and key content—beyond traditional perceptual-quality metrics. It constructs the first annotated dataset of fidelity changes across five SR models (including diffusion-based methods) and analyzes how current models vary in semantic preservation. The authors show that foundation-model embeddings better capture high-level fidelity than traditional IQA metrics and introduce a cosine-similarity fidelity metric, which they then use to fine-tune an SR model. Results demonstrate gains in semantic fidelity and perceptual quality, highlighting the practical value of including fidelity feedback in SR evaluation and optimization.

Abstract

Recent image Super-Resolution (SR) models are achieving impressive effects in reconstructing details and delivering visually pleasant outputs. However, the overpowering generative ability can sometimes hallucinate and thus change the image content despite gaining high visual quality. This type of high-level change can be easily identified by humans yet not well-studied in existing low-level image quality metrics. In this paper, we establish the importance of measuring high-level fidelity for SR models as a complementary criterion to reveal the reliability of generative SR models. We construct the first annotated dataset with fidelity scores from different SR models, and evaluate how state-of-the-art (SOTA) SR models actually perform in preserving high-level fidelity. Based on the dataset, we then analyze how existing image quality metrics correlate with fidelity measurement, and further show that this high-level task can be better addressed by foundation models. Finally, by fine-tuning SR models based on our fidelity feedback, we show that both semantic fidelity and perceptual quality can be improved, demonstrating the potential value of our proposed criteria, both in model evaluation and optimization. We will release the dataset, code, and models upon acceptance.

Evaluating and Preserving High-level Fidelity in Super-Resolution

TL;DR

The paper argues that evaluating SR requires assessing high-level fidelity—the preservation of semantics and key content—beyond traditional perceptual-quality metrics. It constructs the first annotated dataset of fidelity changes across five SR models (including diffusion-based methods) and analyzes how current models vary in semantic preservation. The authors show that foundation-model embeddings better capture high-level fidelity than traditional IQA metrics and introduce a cosine-similarity fidelity metric, which they then use to fine-tune an SR model. Results demonstrate gains in semantic fidelity and perceptual quality, highlighting the practical value of including fidelity feedback in SR evaluation and optimization.

Abstract

Recent image Super-Resolution (SR) models are achieving impressive effects in reconstructing details and delivering visually pleasant outputs. However, the overpowering generative ability can sometimes hallucinate and thus change the image content despite gaining high visual quality. This type of high-level change can be easily identified by humans yet not well-studied in existing low-level image quality metrics. In this paper, we establish the importance of measuring high-level fidelity for SR models as a complementary criterion to reveal the reliability of generative SR models. We construct the first annotated dataset with fidelity scores from different SR models, and evaluate how state-of-the-art (SOTA) SR models actually perform in preserving high-level fidelity. Based on the dataset, we then analyze how existing image quality metrics correlate with fidelity measurement, and further show that this high-level task can be better addressed by foundation models. Finally, by fine-tuning SR models based on our fidelity feedback, we show that both semantic fidelity and perceptual quality can be improved, demonstrating the potential value of our proposed criteria, both in model evaluation and optimization. We will release the dataset, code, and models upon acceptance.

Paper Structure

This paper contains 14 sections, 1 equation, 7 figures, 4 tables.

Figures (7)

  • Figure 1: We show that when evaluating SR models, high-level fidelity can reflect complementary aspects of images from the generally assessed image visual quality. Left: an illustration that, except for perceptual visual quality (evaluated vertically), models can perform differently in terms of preserving fidelity (evaluated horizontally). Right: SR examples corresponding to four quadrants, note that recent diffusion-based SR models can achieve high visual quality yet drastically change image fidelity (up-left), leading to less convincing results. Please zoom in for better view.
  • Figure 2: We show three different types of high-level fidelity changes. From top to bottom, fidelity change in details (texts and the QR code), fidelity changes in local structure (note the alteration of a cloud to the bird's wing reducing the reliability of the image), and fidelity changes in holistic semantics (the man in a quad is changed to a parakeet). Please zoom in for better view.
  • Figure 3: Our custom interface for the user study. Users are shown two images side to side. One of the images is the Ground Truth and the other is the output of a Super-Resolution model. Users are asked whether there is a high-level fidelity difference between both images, and are allowed to answer either "Yes" or "No".
  • Figure 4: Statistical distribution of the fidelity scores in our dataset. Lower means better. The fidelity scores of different SR models are labeled in different colors.
  • Figure 5: Method scheme. We initially pass the two images, SR and GT through the backbone and then compute the cosine similarity from the output embeddings, during training we regress this score to the dataset ground truth.
  • ...and 2 more figures