Assessing Image Quality Using a Simple Generative Representation
Simon Raviv, Gal Chechik
TL;DR
The paper tackles full-reference image quality assessment (IQA) by addressing the limitations of discriminative, class-focused representations in cross-domain settings. It introduces VAE-QA, a lightweight architecture that leverages a pre-trained variational autoencoder (VAE) latent space, fuses multi-layer features, and predicts MOS via a small MLP. Across standard IQA benchmarks and cross-dataset tests, VAE-QA achieves state-of-the-art generalization while using substantially fewer parameters and faster inference than prior methods. The results indicate that generative latent representations better preserve image details relevant to perceived quality, with potential extensions to video quality assessment and broader generative-model-based quality tasks.
Abstract
Perceptual image quality assessment (IQA) is the task of predicting the visual quality of an image as perceived by a human observer. Current state-of-the-art techniques are based on deep representations trained in discriminative manner. Such representations may ignore visually important features, if they are not predictive of class labels. Recent generative models successfully learn low-dimensional representations using auto-encoding and have been argued to preserve better visual features. Here we leverage existing auto-encoders and propose VAE-QA, a simple and efficient method for predicting image quality in the presence of a full-reference. We evaluate our approach on four standard benchmarks and find that it significantly improves generalization across datasets, has fewer trainable parameters, a smaller memory footprint and faster run time.
