Table of Contents
Fetching ...

SEAL: A Framework for Systematic Evaluation of Real-World Super-Resolution

Wenlong Zhang, Xiaohui Li, Xiangyu Chen, Yu Qiao, Xiao-Ming Wu, Chao Dong

TL;DR

SEAL addresses the bias in real-world SR evaluation by clustering a vast degradation space into representative degradation centers and assessing methods via a coarse-to-fine protocol built around acceptance (AR) and relative performance (RPR) metrics. It introduces distributed-relative evaluation, defines AR and RPR with explicit lines of acceptance and excellence, and constructs representative SE test sets from cluster centers. Through extensive experiments on MSE-based and GAN-based real-SR methods, SEAL reveals distributional performance patterns, provides robust rankings, and highlights insights that are obscured by conventional evaluation. The framework is adaptable to different IQA metrics and test-set configurations, offering a practical path toward a comprehensive, real-SR evaluation platform and guiding the development of stronger real-SR models.

Abstract

Real-world Super-Resolution (Real-SR) methods focus on dealing with diverse real-world images and have attracted increasing attention in recent years. The key idea is to use a complex and high-order degradation model to mimic real-world degradations. Although they have achieved impressive results in various scenarios, they are faced with the obstacle of evaluation. Currently, these methods are only assessed by their average performance on a small set of degradation cases randomly selected from a large space, which fails to provide a comprehensive understanding of their overall performance and often yields inconsistent and potentially misleading results. To overcome the limitation in evaluation, we propose SEAL, a framework for systematic evaluation of real-SR. In particular, we cluster the extensive degradation space to create a set of representative degradation cases, which serves as a comprehensive test set. Next, we propose a coarse-to-fine evaluation protocol to measure the distributed and relative performance of real-SR methods on the test set. The protocol incorporates two new metrics: acceptance rate (AR) and relative performance ratio (RPR), derived from acceptance and excellence lines. Under SEAL, we benchmark existing real-SR methods, obtain new observations and insights into their performance, and develop a new strong baseline. We consider SEAL as the first step towards creating a comprehensive real-SR evaluation platform, which can promote the development of real-SR. The source code is available at https://github.com/XPixelGroup/SEAL

SEAL: A Framework for Systematic Evaluation of Real-World Super-Resolution

TL;DR

SEAL addresses the bias in real-world SR evaluation by clustering a vast degradation space into representative degradation centers and assessing methods via a coarse-to-fine protocol built around acceptance (AR) and relative performance (RPR) metrics. It introduces distributed-relative evaluation, defines AR and RPR with explicit lines of acceptance and excellence, and constructs representative SE test sets from cluster centers. Through extensive experiments on MSE-based and GAN-based real-SR methods, SEAL reveals distributional performance patterns, provides robust rankings, and highlights insights that are obscured by conventional evaluation. The framework is adaptable to different IQA metrics and test-set configurations, offering a practical path toward a comprehensive, real-SR evaluation platform and guiding the development of stronger real-SR models.

Abstract

Real-world Super-Resolution (Real-SR) methods focus on dealing with diverse real-world images and have attracted increasing attention in recent years. The key idea is to use a complex and high-order degradation model to mimic real-world degradations. Although they have achieved impressive results in various scenarios, they are faced with the obstacle of evaluation. Currently, these methods are only assessed by their average performance on a small set of degradation cases randomly selected from a large space, which fails to provide a comprehensive understanding of their overall performance and often yields inconsistent and potentially misleading results. To overcome the limitation in evaluation, we propose SEAL, a framework for systematic evaluation of real-SR. In particular, we cluster the extensive degradation space to create a set of representative degradation cases, which serves as a comprehensive test set. Next, we propose a coarse-to-fine evaluation protocol to measure the distributed and relative performance of real-SR methods on the test set. The protocol incorporates two new metrics: acceptance rate (AR) and relative performance ratio (RPR), derived from acceptance and excellence lines. Under SEAL, we benchmark existing real-SR methods, obtain new observations and insights into their performance, and develop a new strong baseline. We consider SEAL as the first step towards creating a comprehensive real-SR evaluation platform, which can promote the development of real-SR. The source code is available at https://github.com/XPixelGroup/SEAL
Paper Structure (30 sections, 11 equations, 22 figures, 10 tables, 1 algorithm)

This paper contains 30 sections, 11 equations, 22 figures, 10 tables, 1 algorithm.

Figures (22)

  • Figure 1: (a) We compare the average performance of BSRNet and RealESRNet on two real test sets generated by common practice. There is a significant variance in their performance: the differences between their average PSNR on the two test sets are -0.23dB and 0.18dB respectively, leading to contradictory conclusions. (b) BSRNet and RealESRNet assessed under our SEAL framework in a distributed manner with 100 representative test sets. It shows the former outperforms the latter in 60$\%$ cases, providing a comprehensive overview of their performance.
  • Figure 2: Our proposed evaluation framework consists of a clustering-based approach for degradation space modeling (Sec. \ref{['sec. clustering approach']}) and a set of metrics based on representative degradation cases (Sec. \ref{['sec. evaluation metrics']}). We divide the degradation space into $K$ clusters and use the degradation parameters of the class centers to create $K$ training datasets to train $K$ non-blind tiny / large SR models as the acceptance / excellence line. The distributed performance (Eq. \ref{['eq_DistributioanPerformance']}) of the real-SR model across the $K$ test datasets will be compared with the acceptance and excellence lines and evaluated by a set of metrics including $AR$ (acceptance rate), $RPR$ (relative performance ratio), $RPR_A$ (average $RPR$ on acceptable cases), and $RPR_U$ (average $RPR$ on unacceptable cases).
  • Figure 3: A coarse-to-fine evaluation protocol to rank real-SR models with the proposed metrics.
  • Figure 4: Visualization of distributed performance in PSNR for MSE-based real-SR methods on Set14-SE.
  • Figure 5: Visual results of MSE-based real-SR methods with the acceptance line FSRCNN and excellence line SRResNet. It is best viewed in color.
  • ...and 17 more figures