Table of Contents
Fetching ...

Self-Supervised Image Super-Resolution Quality Assessment based on Content-Free Multi-Model Oriented Representation Learning

Kian Majlessi, Amir Masoud Soltani, Mohammad Ebrahim Mahdavi, Aurelien Gourrier, Peyman Adibi

TL;DR

This work tackles no-reference quality assessment for real-world super-resolution where degradations are unpredictable and content-dependent cues are unreliable. It introduces S^3RIQA, a self-supervised framework that learns content-free, model-aware SR distortions via a contrastive pretext task and an auxiliary scaling-factor prediction objective. A new SRMORSS dataset provides large-scale, diverse pretext data spanning many SR methods and scales to learn a robust SR distortion manifold, enabling strong domain adaptation to RealSR-based SR-IQA benchmarks. The approach reduces dependence on HR references and labeled subjective scores, achieving state-of-the-art correlations on RealSRQ, SRIJ, and SREB while maintaining simple downstream mappings (ridge regression) for quality prediction.

Abstract

Super-resolution (SR) applied to real-world low-resolution (LR) images often results in complex, irregular degradations that stem from the inherent complexity of natural scene acquisition. In contrast to SR artifacts arising from synthetic LR images created under well-defined scenarios, those distortions are highly unpredictable and vary significantly across different real-life contexts. Consequently, assessing the quality of SR images (SR-IQA) obtained from realistic LR, remains a challenging and underexplored problem. In this work, we introduce a no-reference SR-IQA approach tailored for such highly ill-posed realistic settings. The proposed method enables domain-adaptive IQA for real-world SR applications, particularly in data-scarce domains. We hypothesize that degradations in super-resolved images are strongly dependent on the underlying SR algorithms, rather than being solely determined by image content. To this end, we introduce a self-supervised learning (SSL) strategy that first pretrains multiple SR model oriented representations in a pretext stage. Our contrastive learning framework forms positive pairs from images produced by the same SR model and negative pairs from those generated by different methods, independent of image content. The proposed approach S3 RIQA, further incorporates targeted preprocessing to extract complementary quality information and an auxiliary task to better handle the various degradation profiles associated with different SR scaling factors. To this end, we constructed a new dataset, SRMORSS, to support unsupervised pretext training; it includes a wide range of SR algorithms applied to numerous real LR images, which addresses a gap in existing datasets. Experiments on real SR-IQA benchmarks demonstrate that S3 RIQA consistently outperforms most state-of-the-art relevant metrics.

Self-Supervised Image Super-Resolution Quality Assessment based on Content-Free Multi-Model Oriented Representation Learning

TL;DR

This work tackles no-reference quality assessment for real-world super-resolution where degradations are unpredictable and content-dependent cues are unreliable. It introduces S^3RIQA, a self-supervised framework that learns content-free, model-aware SR distortions via a contrastive pretext task and an auxiliary scaling-factor prediction objective. A new SRMORSS dataset provides large-scale, diverse pretext data spanning many SR methods and scales to learn a robust SR distortion manifold, enabling strong domain adaptation to RealSR-based SR-IQA benchmarks. The approach reduces dependence on HR references and labeled subjective scores, achieving state-of-the-art correlations on RealSRQ, SRIJ, and SREB while maintaining simple downstream mappings (ridge regression) for quality prediction.

Abstract

Super-resolution (SR) applied to real-world low-resolution (LR) images often results in complex, irregular degradations that stem from the inherent complexity of natural scene acquisition. In contrast to SR artifacts arising from synthetic LR images created under well-defined scenarios, those distortions are highly unpredictable and vary significantly across different real-life contexts. Consequently, assessing the quality of SR images (SR-IQA) obtained from realistic LR, remains a challenging and underexplored problem. In this work, we introduce a no-reference SR-IQA approach tailored for such highly ill-posed realistic settings. The proposed method enables domain-adaptive IQA for real-world SR applications, particularly in data-scarce domains. We hypothesize that degradations in super-resolved images are strongly dependent on the underlying SR algorithms, rather than being solely determined by image content. To this end, we introduce a self-supervised learning (SSL) strategy that first pretrains multiple SR model oriented representations in a pretext stage. Our contrastive learning framework forms positive pairs from images produced by the same SR model and negative pairs from those generated by different methods, independent of image content. The proposed approach S3 RIQA, further incorporates targeted preprocessing to extract complementary quality information and an auxiliary task to better handle the various degradation profiles associated with different SR scaling factors. To this end, we constructed a new dataset, SRMORSS, to support unsupervised pretext training; it includes a wide range of SR algorithms applied to numerous real LR images, which addresses a gap in existing datasets. Experiments on real SR-IQA benchmarks demonstrate that S3 RIQA consistently outperforms most state-of-the-art relevant metrics.
Paper Structure (24 sections, 11 equations, 4 figures, 7 tables)

This paper contains 24 sections, 11 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: The pretext architecture of $\mathrm{S^3RIQA}$ consists of three components: (1) For each SR image $sr_l^j$, a positive sample $sr_k^j$ is generated by taking a random crop from the same SR method but with different content. Both crops are then processed using color–space and random flip transformations. (2) Each transformed crop is passed through a symmetric SSL backbone containing an encoder and a projection head, to obtain latent representations $h$ and projected representations $z$, respectively, on which a contrastive loss is computed. (3) To encourage learning more robust and scale-aware representations, an auxiliary regression task is applied to the latent features to predict the scaling factor.
  • Figure 2: Visual comparison between the HR image (left) and the corresponding SR images (right) with a scaling factor of 4.
  • Figure 3: Partial visualization of the latent space using t-SNE algorithm maaten2008visualizing showing the representations of the images of SwinIR, VDSR, RealSRGAN models with scaling factors of $\times2$, $\times3$ and $\times4$. The clusters indicate content-free, model-oriented embeddings that drive the superior performance of $\mathrm{S^3RIQA}$.
  • Figure 4: Evaluation phase of the downstream stage. A test sample $sr^{tst}$ and its downsampled version, first undergoes a cropping step. All crops are then fed into the encoder trained during the pretext stage. The embeddings of each crop and its downsampled counterpart are concatenated (denoted by the symbol $\oslash$) and provided as input to the regression model trained on $D^{trn}$ dataset. The final predicted quality score $q^{tst}$ is computed by averaging the predictions across all crops.