Table of Contents
Fetching ...

Deep Shape-Texture Statistics for Completely Blind Image Quality Evaluation

Yixuan Li, Peilin Chen, Hanwei Zhu, Keyan Ding, Leida Li, Shiqi Wang

TL;DR

This work tackles completely blind image quality assessment by introducing Deep Shape-Texture Statistics (DSTS), a unified representation that fuses shape-biased and texture-biased deep features via Shape-Texture Adaptive Fusion (STAF). Quality is quantified by a variant Mahalanobis distance $D_q$ between the outer statistics $p_G(oldsymbol{x})$ from pristine images and inner statistics $p_M(oldsymbol{x})$ from distorted images, computed over a bias-aware, multi-scale deep embedding. The approach demonstrates state-of-the-art performance across synthetic, authentic, and generative distortions, shows strong generalization in cross-database settings, and enables personalized BIQA by tailoring outer statistics to individual users. This framework reduces reliance on subjective MOS labels and reference images, offering a robust, scalable solution for blind image quality evaluation in diverse real-world scenarios.

Abstract

Opinion-Unaware Blind Image Quality Assessment (OU-BIQA) models aim to predict image quality without training on reference images and subjective quality scores. Thereinto, image statistical comparison is a classic paradigm, while the performance is limited by the representation ability of visual descriptors. Deep features as visual descriptors have advanced IQA in recent research, but they are discovered to be highly texture-biased and lack of shape-bias. On this basis, we find out that image shape and texture cues respond differently towards distortions, and the absence of either one results in an incomplete image representation. Therefore, to formulate a well-round statistical description for images, we utilize the shapebiased and texture-biased deep features produced by Deep Neural Networks (DNNs) simultaneously. More specifically, we design a Shape-Texture Adaptive Fusion (STAF) module to merge shape and texture information, based on which we formulate qualityrelevant image statistics. The perceptual quality is quantified by the variant Mahalanobis Distance between the inner and outer Shape-Texture Statistics (DSTS), wherein the inner and outer statistics respectively describe the quality fingerprints of the distorted image and natural images. The proposed DSTS delicately utilizes shape-texture statistical relations between different data scales in the deep domain, and achieves state-of-the-art (SOTA) quality prediction performance on images with artificial and authentic distortions.

Deep Shape-Texture Statistics for Completely Blind Image Quality Evaluation

TL;DR

This work tackles completely blind image quality assessment by introducing Deep Shape-Texture Statistics (DSTS), a unified representation that fuses shape-biased and texture-biased deep features via Shape-Texture Adaptive Fusion (STAF). Quality is quantified by a variant Mahalanobis distance between the outer statistics from pristine images and inner statistics from distorted images, computed over a bias-aware, multi-scale deep embedding. The approach demonstrates state-of-the-art performance across synthetic, authentic, and generative distortions, shows strong generalization in cross-database settings, and enables personalized BIQA by tailoring outer statistics to individual users. This framework reduces reliance on subjective MOS labels and reference images, offering a robust, scalable solution for blind image quality evaluation in diverse real-world scenarios.

Abstract

Opinion-Unaware Blind Image Quality Assessment (OU-BIQA) models aim to predict image quality without training on reference images and subjective quality scores. Thereinto, image statistical comparison is a classic paradigm, while the performance is limited by the representation ability of visual descriptors. Deep features as visual descriptors have advanced IQA in recent research, but they are discovered to be highly texture-biased and lack of shape-bias. On this basis, we find out that image shape and texture cues respond differently towards distortions, and the absence of either one results in an incomplete image representation. Therefore, to formulate a well-round statistical description for images, we utilize the shapebiased and texture-biased deep features produced by Deep Neural Networks (DNNs) simultaneously. More specifically, we design a Shape-Texture Adaptive Fusion (STAF) module to merge shape and texture information, based on which we formulate qualityrelevant image statistics. The perceptual quality is quantified by the variant Mahalanobis Distance between the inner and outer Shape-Texture Statistics (DSTS), wherein the inner and outer statistics respectively describe the quality fingerprints of the distorted image and natural images. The proposed DSTS delicately utilizes shape-texture statistical relations between different data scales in the deep domain, and achieves state-of-the-art (SOTA) quality prediction performance on images with artificial and authentic distortions.
Paper Structure (22 sections, 17 equations, 6 figures, 6 tables)

This paper contains 22 sections, 17 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Visualization of image shape-texture activation and statistical distribution. The top row contains a distorted image and its Class Activation Mapping (CAM) maps selvaraju2017grad of shape and texture models, which indicate the regions attended by models. Cooler color indicates more attentions paid by models. The second row contains the t-SNE tsne visualized image statistical distributions in terms of shape, texture, and shape-texture deep features.
  • Figure 2: Overview of the proposed DSTS framework. More specifically, DSTS contains three stages. In stage 1, we first formulate the image outer statistics based on a set of ideally pristine images with shape-texture statistics in deep domain. In stage 2, the inner statistics are extracted from each distorted image with shape-texture statistics in deep domain. In particular, in stages 1&2, the mezzanine stage transforms the images into the perceptual space, which is detailed in Fig. \ref{['framework2']}. In stage 3, the quality-aware statistical distance measure between the inner and outer distributions quantifies the perceptual quality.
  • Figure 3: Illustration of the mezzanine stage for shape-texture oriented perceptual transformation based on the shape and texture CNN branches. The final shape-texture embedding are composed of five stages of convolutional outputs fused by the proposed STAF module.
  • Figure 4: Quality prediction results and statistical visualizations of distorted images sampled from the LIVE database. The distortion types, DMOSs, predicted DSTS values and the 2-D t-SNE visualized image inner statistical distributions are shown.
  • Figure 5: Exemplar images with ratings from the subject No. 44771835 of the KADID database. The first row contains the distorted images with the highest quality rating of 5, and the corresponding DMOSs are given in the bottom left corner. The second row contains another six distorted images rated by this subject from 1 to 4, ranking in the increasing order according to the predicted DSTS values.
  • ...and 1 more figures