Toward Generalized Image Quality Assessment: Relaxing the Perfect Reference Quality Assumption
Du Chen, Tianhe Wu, Kede Ma, Lei Zhang
TL;DR
This work tackles the limitation of the perfect-reference assumption in FR-IQA by introducing DiffIQA, a large-scale dataset of images generated by a diffusion-based enhancer with variable quality relative to references, and A-FINE, a generalized FR-IQA model that adaptively balances fidelity and naturalness via a learned weighting. A-FINE leverages a ViT-CLIP backbone and a pairwise learning-to-rank objective under Thurstone's model to achieve state-of-the-art performance on standard IQA benchmarks and the new DiffIQA dataset, while also validating generalization on a new SR-focused benchmark, SRIQA-Bench. The combination of DiffIQA data, SRIQA-Bench validation, and the adaptive fidelity-naturalness framework enables robust quality assessment in the era of deep generative enhancements, with code and data made public for reproducibility.
Abstract
Full-reference image quality assessment (FR-IQA) generally assumes that reference images are of perfect quality. However, this assumption is flawed due to the sensor and optical limitations of modern imaging systems. Moreover, recent generative enhancement methods are capable of producing images of higher quality than their original. All of these challenge the effectiveness and applicability of current FR-IQA models. To relax the assumption of perfect reference image quality, we build a large-scale IQA database, namely DiffIQA, containing approximately 180,000 images generated by a diffusion-based image enhancer with adjustable hyper-parameters. Each image is annotated by human subjects as either worse, similar, or better quality compared to its reference. Building on this, we present a generalized FR-IQA model, namely Adaptive Fidelity-Naturalness Evaluator (A-FINE), to accurately assess and adaptively combine the fidelity and naturalness of a test image. A-FINE aligns well with standard FR-IQA when the reference image is much more natural than the test image. We demonstrate by extensive experiments that A-FINE surpasses standard FR-IQA models on well-established IQA datasets and our newly created DiffIQA. To further validate A-FINE, we additionally construct a super-resolution IQA benchmark (SRIQA-Bench), encompassing test images derived from ten state-of-the-art SR methods with reliable human quality annotations. Tests on SRIQA-Bench re-affirm the advantages of A-FINE. The code and dataset are available at https://tianhewu.github.io/A-FINE-page.github.io/.
