Deep Shape-Texture Statistics for Completely Blind Image Quality Evaluation
Yixuan Li, Peilin Chen, Hanwei Zhu, Keyan Ding, Leida Li, Shiqi Wang
TL;DR
This work tackles completely blind image quality assessment by introducing Deep Shape-Texture Statistics (DSTS), a unified representation that fuses shape-biased and texture-biased deep features via Shape-Texture Adaptive Fusion (STAF). Quality is quantified by a variant Mahalanobis distance $D_q$ between the outer statistics $p_G(oldsymbol{x})$ from pristine images and inner statistics $p_M(oldsymbol{x})$ from distorted images, computed over a bias-aware, multi-scale deep embedding. The approach demonstrates state-of-the-art performance across synthetic, authentic, and generative distortions, shows strong generalization in cross-database settings, and enables personalized BIQA by tailoring outer statistics to individual users. This framework reduces reliance on subjective MOS labels and reference images, offering a robust, scalable solution for blind image quality evaluation in diverse real-world scenarios.
Abstract
Opinion-Unaware Blind Image Quality Assessment (OU-BIQA) models aim to predict image quality without training on reference images and subjective quality scores. Thereinto, image statistical comparison is a classic paradigm, while the performance is limited by the representation ability of visual descriptors. Deep features as visual descriptors have advanced IQA in recent research, but they are discovered to be highly texture-biased and lack of shape-bias. On this basis, we find out that image shape and texture cues respond differently towards distortions, and the absence of either one results in an incomplete image representation. Therefore, to formulate a well-round statistical description for images, we utilize the shapebiased and texture-biased deep features produced by Deep Neural Networks (DNNs) simultaneously. More specifically, we design a Shape-Texture Adaptive Fusion (STAF) module to merge shape and texture information, based on which we formulate qualityrelevant image statistics. The perceptual quality is quantified by the variant Mahalanobis Distance between the inner and outer Shape-Texture Statistics (DSTS), wherein the inner and outer statistics respectively describe the quality fingerprints of the distorted image and natural images. The proposed DSTS delicately utilizes shape-texture statistical relations between different data scales in the deep domain, and achieves state-of-the-art (SOTA) quality prediction performance on images with artificial and authentic distortions.
