Table of Contents
Fetching ...

Assessing Image Quality Using a Simple Generative Representation

Simon Raviv, Gal Chechik

TL;DR

The paper tackles full-reference image quality assessment (IQA) by addressing the limitations of discriminative, class-focused representations in cross-domain settings. It introduces VAE-QA, a lightweight architecture that leverages a pre-trained variational autoencoder (VAE) latent space, fuses multi-layer features, and predicts MOS via a small MLP. Across standard IQA benchmarks and cross-dataset tests, VAE-QA achieves state-of-the-art generalization while using substantially fewer parameters and faster inference than prior methods. The results indicate that generative latent representations better preserve image details relevant to perceived quality, with potential extensions to video quality assessment and broader generative-model-based quality tasks.

Abstract

Perceptual image quality assessment (IQA) is the task of predicting the visual quality of an image as perceived by a human observer. Current state-of-the-art techniques are based on deep representations trained in discriminative manner. Such representations may ignore visually important features, if they are not predictive of class labels. Recent generative models successfully learn low-dimensional representations using auto-encoding and have been argued to preserve better visual features. Here we leverage existing auto-encoders and propose VAE-QA, a simple and efficient method for predicting image quality in the presence of a full-reference. We evaluate our approach on four standard benchmarks and find that it significantly improves generalization across datasets, has fewer trainable parameters, a smaller memory footprint and faster run time.

Assessing Image Quality Using a Simple Generative Representation

TL;DR

The paper tackles full-reference image quality assessment (IQA) by addressing the limitations of discriminative, class-focused representations in cross-domain settings. It introduces VAE-QA, a lightweight architecture that leverages a pre-trained variational autoencoder (VAE) latent space, fuses multi-layer features, and predicts MOS via a small MLP. Across standard IQA benchmarks and cross-dataset tests, VAE-QA achieves state-of-the-art generalization while using substantially fewer parameters and faster inference than prior methods. The results indicate that generative latent representations better preserve image details relevant to perceived quality, with potential extensions to video quality assessment and broader generative-model-based quality tasks.

Abstract

Perceptual image quality assessment (IQA) is the task of predicting the visual quality of an image as perceived by a human observer. Current state-of-the-art techniques are based on deep representations trained in discriminative manner. Such representations may ignore visually important features, if they are not predictive of class labels. Recent generative models successfully learn low-dimensional representations using auto-encoding and have been argued to preserve better visual features. Here we leverage existing auto-encoders and propose VAE-QA, a simple and efficient method for predicting image quality in the presence of a full-reference. We evaluate our approach on four standard benchmarks and find that it significantly improves generalization across datasets, has fewer trainable parameters, a smaller memory footprint and faster run time.
Paper Structure (34 sections, 3 equations, 7 figures, 12 tables, 4 algorithms)

This paper contains 34 sections, 3 equations, 7 figures, 12 tables, 4 algorithms.

Figures (7)

  • Figure 1: VAE-QA architecture: Feature extraction module extracts image representations from input images using a VAE. Feature fusion module combines the extracted image representations to form a compressed representation using within & across VAE layer(s) components. Quality prediction module uses the compressed representation to predict the quality score of the input images using a MLP network.
  • Figure 2: MOS vs. Predicted MOS for three IQA datasets.
  • Figure 3: MOS vs. Predicted MOS. Trained on KADID-10k, tested on other IQA datasets.
  • Figure 4: Quality prediction by distortion type on TID2013 dataset. The figure compares PLCC obtained with our VAE-QA and AHIQ.
  • Figure 5: The effect of the number of crops on the SRCC.
  • ...and 2 more figures