Table of Contents
Fetching ...

Augmenting Perceptual Super-Resolution via Image Quality Predictors

Fengjia Zhang, Samrudhdhi B. Rangrej, Tristan Aumentado-Armstrong, Afsaneh Fazly, Alex Levinshtein

TL;DR

This work tackles the ill-posed nature of single-image super-resolution by leveraging no-reference IQA predictors to guide training, either through IQA-weighted sampling of multiple enhanced ground-truths or through differentiable optimization of image quality. By systematically analyzing NR-IQA metrics on SBS180K and HGGT, the authors identify MUSIQ (and complementary metrics like NIMA and Q-Align) as robust signals and implement two NR-IQA-based strategies: reweighted GT sampling (SMA, SMP, AMO) and direct optimization with regularization via LoRA. The combination of Argmax-online sampling and NR-IQA-guided fine-tuning (AMO+FT) achieves state-of-the-art perceptual-quality SR without human annotations, outperforming human-guided positives-only baselines on NR metrics and receiving favorable user-study preferences. The results demonstrate a scalable path to enhancing perceptual SR quality through existing NR-IQA models, with practical implications for real-world SR under domain shift and subjective quality assessments.

Abstract

Super-resolution (SR), a classical inverse problem in computer vision, is inherently ill-posed, inducing a distribution of plausible solutions for every input. However, the desired result is not simply the expectation of this distribution, which is the blurry image obtained by minimizing pixelwise error, but rather the sample with the highest image quality. A variety of techniques, from perceptual metrics to adversarial losses, are employed to this end. In this work, we explore an alternative: utilizing powerful non-reference image quality assessment (NR-IQA) models in the SR context. We begin with a comprehensive analysis of NR-IQA metrics on human-derived SR data, identifying both the accuracy (human alignment) and complementarity of different metrics. Then, we explore two methods of applying NR-IQA models to SR learning: (i) altering data sampling, by building on an existing multi-ground-truth SR framework, and (ii) directly optimizing a differentiable quality score. Our results demonstrate a more human-centric perception-distortion tradeoff, focusing less on non-perceptual pixel-wise distortion, instead improving the balance between perceptual fidelity and human-tuned NR-IQA measures.

Augmenting Perceptual Super-Resolution via Image Quality Predictors

TL;DR

This work tackles the ill-posed nature of single-image super-resolution by leveraging no-reference IQA predictors to guide training, either through IQA-weighted sampling of multiple enhanced ground-truths or through differentiable optimization of image quality. By systematically analyzing NR-IQA metrics on SBS180K and HGGT, the authors identify MUSIQ (and complementary metrics like NIMA and Q-Align) as robust signals and implement two NR-IQA-based strategies: reweighted GT sampling (SMA, SMP, AMO) and direct optimization with regularization via LoRA. The combination of Argmax-online sampling and NR-IQA-guided fine-tuning (AMO+FT) achieves state-of-the-art perceptual-quality SR without human annotations, outperforming human-guided positives-only baselines on NR metrics and receiving favorable user-study preferences. The results demonstrate a scalable path to enhancing perceptual SR quality through existing NR-IQA models, with practical implications for real-world SR under domain shift and subjective quality assessments.

Abstract

Super-resolution (SR), a classical inverse problem in computer vision, is inherently ill-posed, inducing a distribution of plausible solutions for every input. However, the desired result is not simply the expectation of this distribution, which is the blurry image obtained by minimizing pixelwise error, but rather the sample with the highest image quality. A variety of techniques, from perceptual metrics to adversarial losses, are employed to this end. In this work, we explore an alternative: utilizing powerful non-reference image quality assessment (NR-IQA) models in the SR context. We begin with a comprehensive analysis of NR-IQA metrics on human-derived SR data, identifying both the accuracy (human alignment) and complementarity of different metrics. Then, we explore two methods of applying NR-IQA models to SR learning: (i) altering data sampling, by building on an existing multi-ground-truth SR framework, and (ii) directly optimizing a differentiable quality score. Our results demonstrate a more human-centric perception-distortion tradeoff, focusing less on non-perceptual pixel-wise distortion, instead improving the balance between perceptual fidelity and human-tuned NR-IQA measures.

Paper Structure

This paper contains 27 sections, 3 equations, 8 figures, 8 tables.

Figures (8)

  • Figure 1: Schematics for improving perceptual super-resolution (SR). Perceptual quality of SR can be improved in two ways: (Left) providing supervision through multiple enhanced ground-truths (EGT) or (Right) direct optimization for the quality of the super-resolved image. In both cases, human-in-the-loop can greatly improve performance. However, manual annotation is tedious, imprecise, and non-differentiable. An IQA metric can replace a human in rating the enhanced ground-truths or can directly act as a differentiable optimization objective. In this paper, we specifically assess whether more practical no-reference (NR) IQA metrics can replace human raters for SR. We find that combining NR-IQA-based sampling and regularized optimization is sufficient to attain state-of-the-art perceptual image quality, without requiring human ratings.
  • Figure 2: Fine-Grained Comparison via NR-IQA. MUSIQ can differentiate the quality of two images, both marked as "positive" by human annotators. Higher MUSIQ indicates higher quality (zoom for details). Unlike HGGT models, which utilize a uniform distribution over positives, our approach enables differently weighting them (§\ref{['sec:methods:rs']}).
  • Figure 3: Structured Optimization Noise. Optimizing via an NR-IQA metric (MUSIQ ke2021musiq) generates structured artifacts (left), similar to an adversarial attack, while utilizing LoRA removes this noise (right; see §\ref{['sec:methods:do']} and Supp. §\ref{['supp:sec:optart']}). Zoom in for details.
  • Figure 4: Qualitative results with NR-IQA Guidance. Following the notation of Table \ref{['tab:mainresults']}, columns 3-5 are (top 2 rows) SwinIR-UPos, SwinIR-AMO, and SwinIR-AMO + FT, and (bottom 2 rows) Real-ESRGAN-UPos, Real-ESRGAN-AMO, and Real-ESRGAN-AMO + FT. We show MUSIQ scores in insets. Qualitatively, we see improved performance as we move from 'UPos' to 'AMO' to 'AMO-FT', showcasing superiority of each method over the previous one. Zoom in for details. See also Supp. §\ref{['supp:sec:moreexamples']} for additional examples.
  • Figure 5: Structured Noise due to naive NR-IQA optimization. The left three insets show an image and two close-ups that was fine-tuned without LoRA, whereas the right three show the effect of using LoRA. Note the patterns that form in the sky and the strangely coloured pixels that appear around certain edges (e.g., the blue/red grid in the second inset) when LoRA is not used.
  • ...and 3 more figures