Table of Contents
Fetching ...

Image-Difficulty-Aware Evaluation of Super-Resolution Models

Atakan Topaloglu, Ahmet Bilican, Cansu Korkmaz, A. Murat Tekalp

TL;DR

The paper tackles the inadequacy of average SR evaluation metrics, which can conceal how models perform on images of varying difficulty. It introduces two image-difficulty measures, $HFI$ and $RIEI$, to predict challenging images and proposes a difficulty-aware evaluation framework that uses quadrant-based PSNR analysis in the $HFI$-$RIEI$ plane alongside a localized artifact metric, $PSNR99$. Through case studies across different SR approaches, the authors demonstrate that these measures reveal performance patterns hidden by average PSNR and help identify where models excel or struggle on specific content types. The work has practical implications for guiding model design (e.g., mixture of experts) and for refinement of loss functions to mitigate artifacts on hard images, improving SR benchmarking and development.

Abstract

Image super-resolution models are commonly evaluated by average scores (over some benchmark test sets), which fail to reflect the performance of these models on images of varying difficulty and that some models generate artifacts on certain difficult images, which is not reflected by the average scores. We propose difficulty-aware performance evaluation procedures to better differentiate between SISR models that produce visually different results on some images but yield close average performance scores over the entire test set. In particular, we propose two image-difficulty measures, the high-frequency index and rotation-invariant edge index, to predict those test images, where a model would yield significantly better visual results over another model, and an evaluation method where these visual differences are reflected on objective measures. Experimental results demonstrate the effectiveness of the proposed image-difficulty measures and evaluation methodology.

Image-Difficulty-Aware Evaluation of Super-Resolution Models

TL;DR

The paper tackles the inadequacy of average SR evaluation metrics, which can conceal how models perform on images of varying difficulty. It introduces two image-difficulty measures, and , to predict challenging images and proposes a difficulty-aware evaluation framework that uses quadrant-based PSNR analysis in the - plane alongside a localized artifact metric, . Through case studies across different SR approaches, the authors demonstrate that these measures reveal performance patterns hidden by average PSNR and help identify where models excel or struggle on specific content types. The work has practical implications for guiding model design (e.g., mixture of experts) and for refinement of loss functions to mitigate artifacts on hard images, improving SR benchmarking and development.

Abstract

Image super-resolution models are commonly evaluated by average scores (over some benchmark test sets), which fail to reflect the performance of these models on images of varying difficulty and that some models generate artifacts on certain difficult images, which is not reflected by the average scores. We propose difficulty-aware performance evaluation procedures to better differentiate between SISR models that produce visually different results on some images but yield close average performance scores over the entire test set. In particular, we propose two image-difficulty measures, the high-frequency index and rotation-invariant edge index, to predict those test images, where a model would yield significantly better visual results over another model, and an evaluation method where these visual differences are reflected on objective measures. Experimental results demonstrate the effectiveness of the proposed image-difficulty measures and evaluation methodology.

Paper Structure

This paper contains 10 sections, 2 equations, 9 figures, 3 tables, 1 algorithm.

Figures (9)

  • Figure 1: Demonstration of the fact that the average PSNR over validation/test set does not reveal the performance difference between two models using a crop from image 65 from LSDIR cite_LSDIR validation set: (a) high-resolution ground-truth crop, (b) super resolved crop using the edge model (average PSNR 26.1842 dB), (c) super resolved crop using the global model (average PSNR 26.1825 dB).
  • Figure 2: HF-Index Computation
  • Figure 3: SR-PSNR vs HFI for $\times$4 super resolving images in the LSDIR validation set using SwinIR Liang2021SwinIRIR.
  • Figure 4: Edge Index (EI) Computation
  • Figure 5: Images with 45$^\circ$ edges from Urban 100 Dataset where RIEI gives reliable scores. (a) Image 068 (RIEI: 6.240, EI: 1.743) (b) Image 081 (RIEI: 7.943, EI: 1.311)
  • ...and 4 more figures