Table of Contents
Fetching ...

Opinion-Unaware Blind Image Quality Assessment using Multi-Scale Deep Feature Statistics

Zhangkai Ni, Yue Liu, Keyan Ding, Wenhan Yang, Hanli Wang, Shiqi Wang

TL;DR

BIQA without reliance on subjective labels is addressed by MDFS, which blends multi-scale deep features from pre-trained vision models with a statistical MVG distribution to quantify image quality. Training uses only high-quality reference images to form a benchmark MVG, and test images are scored by the distance between their MVG and the benchmark. The work provides detailed ablations, cross-dataset evaluations, and retraining studies demonstrating superior or competitive alignment with human perception and strong generalization for diverse distortion types. Code availability enables replication and broader adoption.

Abstract

Deep learning-based methods have significantly influenced the blind image quality assessment (BIQA) field, however, these methods often require training using large amounts of human rating data. In contrast, traditional knowledge-based methods are cost-effective for training but face challenges in effectively extracting features aligned with human visual perception. To bridge these gaps, we propose integrating deep features from pre-trained visual models with a statistical analysis model into a Multi-scale Deep Feature Statistics (MDFS) model for achieving opinion-unaware BIQA (OU-BIQA), thereby eliminating the reliance on human rating data and significantly improving training efficiency. Specifically, we extract patch-wise multi-scale features from pre-trained vision models, which are subsequently fitted into a multivariate Gaussian (MVG) model. The final quality score is determined by quantifying the distance between the MVG model derived from the test image and the benchmark MVG model derived from the high-quality image set. A comprehensive series of experiments conducted on various datasets show that our proposed model exhibits superior consistency with human visual perception compared to state-of-the-art BIQA models. Furthermore, it shows improved generalizability across diverse target-specific BIQA tasks. Our code is available at: https://github.com/eezkni/MDFS

Opinion-Unaware Blind Image Quality Assessment using Multi-Scale Deep Feature Statistics

TL;DR

BIQA without reliance on subjective labels is addressed by MDFS, which blends multi-scale deep features from pre-trained vision models with a statistical MVG distribution to quantify image quality. Training uses only high-quality reference images to form a benchmark MVG, and test images are scored by the distance between their MVG and the benchmark. The work provides detailed ablations, cross-dataset evaluations, and retraining studies demonstrating superior or competitive alignment with human perception and strong generalization for diverse distortion types. Code availability enables replication and broader adoption.

Abstract

Deep learning-based methods have significantly influenced the blind image quality assessment (BIQA) field, however, these methods often require training using large amounts of human rating data. In contrast, traditional knowledge-based methods are cost-effective for training but face challenges in effectively extracting features aligned with human visual perception. To bridge these gaps, we propose integrating deep features from pre-trained visual models with a statistical analysis model into a Multi-scale Deep Feature Statistics (MDFS) model for achieving opinion-unaware BIQA (OU-BIQA), thereby eliminating the reliance on human rating data and significantly improving training efficiency. Specifically, we extract patch-wise multi-scale features from pre-trained vision models, which are subsequently fitted into a multivariate Gaussian (MVG) model. The final quality score is determined by quantifying the distance between the MVG model derived from the test image and the benchmark MVG model derived from the high-quality image set. A comprehensive series of experiments conducted on various datasets show that our proposed model exhibits superior consistency with human visual perception compared to state-of-the-art BIQA models. Furthermore, it shows improved generalizability across diverse target-specific BIQA tasks. Our code is available at: https://github.com/eezkni/MDFS
Paper Structure (30 sections, 11 equations, 4 figures, 10 tables)

This paper contains 30 sections, 11 equations, 4 figures, 10 tables.

Figures (4)

  • Figure 1: Overview of the proposed MDFS model: (a) Training phase: This process involves fitting a benchmark multivariate Gaussian (MVG) model from a set of high-quality images, including a frozen multi-scale deep feature extraction module (e.g., ResNet, VGG, and EfficientNet), a statistical data analysis model, and an MVG fitting model. (b) Testing phase: The process of assessing the quality of a test image involves calculating the final quality score by measuring the distance between an MVG model fitted using the test image features and the benchmark MVG model obtained in the training phase.
  • Figure 2: Scatter plots of the mean opinion scores (MOS) versus the objective scores computed by the IQA models: (a) NIQE; (b) QAC; (c) PIQE; (d) LPSI; (e) ILNIQE; (f) dipIQ; (g) SNP-NIQE; (h) NPQI; (i) ContentSep, and (j) MDFS, respectively.
  • Figure 3: The statistically significant test results of various OU-BIQA methods on the (a) KADID, (b) TID2013, and (c) CSIQ datasets. A value of "1" indicates that the model in the row is significantly better than the model in the column.
  • Figure 4: Failure cases. The distorted images are generated by the (a) NEPN and (b) LBD distortions in TID2013.