Table of Contents
Fetching ...

Attention Down-Sampling Transformer, Relative Ranking and Self-Consistency for Blind Image Quality Assessment

Mohammed Alsaafin, Musab Alsheikh, Saeed Anwar, Muhammad Usman

TL;DR

A self-consistency approach to self-supervision is presented, explicitly addressing the degradation of no-reference image quality assessment (NR-IQA) models under equivariant transformations, and ensures model robustness by maintaining consistency between an image and its horizontally flipped equivalent.

Abstract

The no-reference image quality assessment is a challenging domain that addresses estimating image quality without the original reference. We introduce an improved mechanism to extract local and non-local information from images via different transformer encoders and CNNs. The utilization of Transformer encoders aims to mitigate locality bias and generate a non-local representation by sequentially processing CNN features, which inherently capture local visual structures. Establishing a stronger connection between subjective and objective assessments is achieved through sorting within batches of images based on relative distance information. A self-consistency approach to self-supervision is presented, explicitly addressing the degradation of no-reference image quality assessment (NR-IQA) models under equivariant transformations. Our approach ensures model robustness by maintaining consistency between an image and its horizontally flipped equivalent. Through empirical evaluation of five popular image quality assessment datasets, the proposed model outperforms alternative algorithms in the context of no-reference image quality assessment datasets, especially on smaller datasets. Codes are available at \href{https://github.com/mas94/ADTRS}{https://github.com/mas94/ADTRS}

Attention Down-Sampling Transformer, Relative Ranking and Self-Consistency for Blind Image Quality Assessment

TL;DR

A self-consistency approach to self-supervision is presented, explicitly addressing the degradation of no-reference image quality assessment (NR-IQA) models under equivariant transformations, and ensures model robustness by maintaining consistency between an image and its horizontally flipped equivalent.

Abstract

The no-reference image quality assessment is a challenging domain that addresses estimating image quality without the original reference. We introduce an improved mechanism to extract local and non-local information from images via different transformer encoders and CNNs. The utilization of Transformer encoders aims to mitigate locality bias and generate a non-local representation by sequentially processing CNN features, which inherently capture local visual structures. Establishing a stronger connection between subjective and objective assessments is achieved through sorting within batches of images based on relative distance information. A self-consistency approach to self-supervision is presented, explicitly addressing the degradation of no-reference image quality assessment (NR-IQA) models under equivariant transformations. Our approach ensures model robustness by maintaining consistency between an image and its horizontally flipped equivalent. Through empirical evaluation of five popular image quality assessment datasets, the proposed model outperforms alternative algorithms in the context of no-reference image quality assessment datasets, especially on smaller datasets. Codes are available at \href{https://github.com/mas94/ADTRS}{https://github.com/mas94/ADTRS}
Paper Structure (11 sections, 9 equations, 4 figures, 2 tables)

This paper contains 11 sections, 9 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Diagram illustrating the NR-IQA Model with inputs and outputs.
  • Figure 2: The basic building block of our the proposed ADTRS architecture.
  • Figure 3: Dual-Path downsampling transformer encoder li2023rethinking adopted in our proposed method.
  • Figure 4: Schematic of the Self-Attention Mechanism.