Table of Contents
Fetching ...

Cross-IQA: Unsupervised Learning for Image Quality Assessment

Zhen Zhang

TL;DR

Cross-IQA tackles the challenge of no-reference image quality assessment by using a vision-transformer-based framework that learns quality-relevant features from unlabeled data through a reconstruction pretext task. The model employs two shared encoders/decoders and a cross class token to exchange quality information, followed by freezing the encoder and training a linear regressor on labeled datasets. It demonstrates state-of-the-art performance on low-frequency degradations and shows robust generalization to LIVE and TID2013 without requiring large labeled IQA datasets. This approach offers a scalable NR-IQA solution with strong practical implications for content quality assessment under real-world degradation conditions.

Abstract

Automatic perception of image quality is a challenging problem that impacts billions of Internet and social media users daily. To advance research in this field, we propose a no-reference image quality assessment (NR-IQA) method termed Cross-IQA based on vision transformer(ViT) model. The proposed Cross-IQA method can learn image quality features from unlabeled image data. We construct the pretext task of synthesized image reconstruction to unsupervised extract the image quality information based ViT block. The pretrained encoder of Cross-IQA is used to fine-tune a linear regression model for score prediction. Experimental results show that Cross-IQA can achieve state-of-the-art performance in assessing the low-frequency degradation information (e.g., color change, blurring, etc.) of images compared with the classical full-reference IQA and NR-IQA under the same datasets.

Cross-IQA: Unsupervised Learning for Image Quality Assessment

TL;DR

Cross-IQA tackles the challenge of no-reference image quality assessment by using a vision-transformer-based framework that learns quality-relevant features from unlabeled data through a reconstruction pretext task. The model employs two shared encoders/decoders and a cross class token to exchange quality information, followed by freezing the encoder and training a linear regressor on labeled datasets. It demonstrates state-of-the-art performance on low-frequency degradations and shows robust generalization to LIVE and TID2013 without requiring large labeled IQA datasets. This approach offers a scalable NR-IQA solution with strong practical implications for content quality assessment under real-world degradation conditions.

Abstract

Automatic perception of image quality is a challenging problem that impacts billions of Internet and social media users daily. To advance research in this field, we propose a no-reference image quality assessment (NR-IQA) method termed Cross-IQA based on vision transformer(ViT) model. The proposed Cross-IQA method can learn image quality features from unlabeled image data. We construct the pretext task of synthesized image reconstruction to unsupervised extract the image quality information based ViT block. The pretrained encoder of Cross-IQA is used to fine-tune a linear regression model for score prediction. Experimental results show that Cross-IQA can achieve state-of-the-art performance in assessing the low-frequency degradation information (e.g., color change, blurring, etc.) of images compared with the classical full-reference IQA and NR-IQA under the same datasets.
Paper Structure (12 sections, 2 equations, 3 figures, 4 tables)

This paper contains 12 sections, 2 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Schematic diagram of the proposed Cross-IQA.
  • Figure 2: Process of Cross-IQA Regression.
  • Figure 3: Example results on Waterloo database validation images. For each triplet, we show the original image (left), the reconstructed image by decoder of Cross-IQA (middle), and the synthetic degraded image (right).