Table of Contents
Fetching ...

Toward Generalized Image Quality Assessment: Relaxing the Perfect Reference Quality Assumption

Du Chen, Tianhe Wu, Kede Ma, Lei Zhang

TL;DR

This work tackles the limitation of the perfect-reference assumption in FR-IQA by introducing DiffIQA, a large-scale dataset of images generated by a diffusion-based enhancer with variable quality relative to references, and A-FINE, a generalized FR-IQA model that adaptively balances fidelity and naturalness via a learned weighting. A-FINE leverages a ViT-CLIP backbone and a pairwise learning-to-rank objective under Thurstone's model to achieve state-of-the-art performance on standard IQA benchmarks and the new DiffIQA dataset, while also validating generalization on a new SR-focused benchmark, SRIQA-Bench. The combination of DiffIQA data, SRIQA-Bench validation, and the adaptive fidelity-naturalness framework enables robust quality assessment in the era of deep generative enhancements, with code and data made public for reproducibility.

Abstract

Full-reference image quality assessment (FR-IQA) generally assumes that reference images are of perfect quality. However, this assumption is flawed due to the sensor and optical limitations of modern imaging systems. Moreover, recent generative enhancement methods are capable of producing images of higher quality than their original. All of these challenge the effectiveness and applicability of current FR-IQA models. To relax the assumption of perfect reference image quality, we build a large-scale IQA database, namely DiffIQA, containing approximately 180,000 images generated by a diffusion-based image enhancer with adjustable hyper-parameters. Each image is annotated by human subjects as either worse, similar, or better quality compared to its reference. Building on this, we present a generalized FR-IQA model, namely Adaptive Fidelity-Naturalness Evaluator (A-FINE), to accurately assess and adaptively combine the fidelity and naturalness of a test image. A-FINE aligns well with standard FR-IQA when the reference image is much more natural than the test image. We demonstrate by extensive experiments that A-FINE surpasses standard FR-IQA models on well-established IQA datasets and our newly created DiffIQA. To further validate A-FINE, we additionally construct a super-resolution IQA benchmark (SRIQA-Bench), encompassing test images derived from ten state-of-the-art SR methods with reliable human quality annotations. Tests on SRIQA-Bench re-affirm the advantages of A-FINE. The code and dataset are available at https://tianhewu.github.io/A-FINE-page.github.io/.

Toward Generalized Image Quality Assessment: Relaxing the Perfect Reference Quality Assumption

TL;DR

This work tackles the limitation of the perfect-reference assumption in FR-IQA by introducing DiffIQA, a large-scale dataset of images generated by a diffusion-based enhancer with variable quality relative to references, and A-FINE, a generalized FR-IQA model that adaptively balances fidelity and naturalness via a learned weighting. A-FINE leverages a ViT-CLIP backbone and a pairwise learning-to-rank objective under Thurstone's model to achieve state-of-the-art performance on standard IQA benchmarks and the new DiffIQA dataset, while also validating generalization on a new SR-focused benchmark, SRIQA-Bench. The combination of DiffIQA data, SRIQA-Bench validation, and the adaptive fidelity-naturalness framework enables robust quality assessment in the era of deep generative enhancements, with code and data made public for reproducibility.

Abstract

Full-reference image quality assessment (FR-IQA) generally assumes that reference images are of perfect quality. However, this assumption is flawed due to the sensor and optical limitations of modern imaging systems. Moreover, recent generative enhancement methods are capable of producing images of higher quality than their original. All of these challenge the effectiveness and applicability of current FR-IQA models. To relax the assumption of perfect reference image quality, we build a large-scale IQA database, namely DiffIQA, containing approximately 180,000 images generated by a diffusion-based image enhancer with adjustable hyper-parameters. Each image is annotated by human subjects as either worse, similar, or better quality compared to its reference. Building on this, we present a generalized FR-IQA model, namely Adaptive Fidelity-Naturalness Evaluator (A-FINE), to accurately assess and adaptively combine the fidelity and naturalness of a test image. A-FINE aligns well with standard FR-IQA when the reference image is much more natural than the test image. We demonstrate by extensive experiments that A-FINE surpasses standard FR-IQA models on well-established IQA datasets and our newly created DiffIQA. To further validate A-FINE, we additionally construct a super-resolution IQA benchmark (SRIQA-Bench), encompassing test images derived from ten state-of-the-art SR methods with reliable human quality annotations. Tests on SRIQA-Bench re-affirm the advantages of A-FINE. The code and dataset are available at https://tianhewu.github.io/A-FINE-page.github.io/.

Paper Structure

This paper contains 22 sections, 12 equations, 11 figures, 7 tables.

Figures (11)

  • Figure 1: With the reference image in the middle, which image, A or B, has better perceived visual quality? The proposed A-FINE generalizes and outperforms standard FR-IQA models under both perfect and imperfect reference conditions. Zoom in for better visibility.
  • Figure 2: (a) Reference image from CSIQ larson2010most and (b) its corresponding enhanced image by a recent generation-based image enhancer, SeeSR wu2024seesr.
  • Figure 3: Standard FR-IQA models tend to fail when the reference image is of non-perfect quality. In this visualization, images are embedded in a perceptually uniform space, where the perceived quality of a test image is described by its Euclidean distance to the perfect-quality image. Images located on the same dashed circles are perceived to have identical visual quality.
  • Figure 4: DiffIQA is constructed in two stages. In Stage 1, we adapt PASD yang2023pixel to a generative image enhancer (see the Appendix for more details) to produce images of varying perceptual quality, some of which are perceived better than the original. In Stage 2, we conduct subjective experiments using incomplete paired comparison, followed by raw subjective data filtering.
  • Figure 5: System diagram of the proposed A-FINE and its pairwise learning-to-rank training procedure. A-FINE leverages a shared feature transformation to make image fidelity and naturalness measurements, which are adaptively combined to produce the final quality score.
  • ...and 6 more figures