Table of Contents
Fetching ...

Deep Feature Statistics Mapping for Generalized Screen Content Image Quality Assessment

Baoliang Chen, Hanwei Zhu, Lingyu Zhu, Shiqi Wang, Sam Kwong

TL;DR

This work tackles the poor cross-dataset performance of SCI quality assessment by learning screen-content statistics directly in the deep feature space. It introduces DFSS-IQA, which uses triplet-based training to disentangle semantic-content from distortion cues and imposes a Gaussian distribution regularization on distortion features via Maximum Mean Discrepancy, enabling robust no-reference quality estimation. A distortion-type classifier and attention mechanism tie semantic information to distortion cues, producing a quality score regressed with MAE loss. Extensive cross- and intra-dataset experiments on SIQAD and SCID demonstrate superior generalization, with ablations and visualizations validating the effectiveness of the learned feature statistics and the disentangled representation.

Abstract

The statistical regularities of natural images, referred to as natural scene statistics, play an important role in no-reference image quality assessment. However, it has been widely acknowledged that screen content images (SCIs), which are typically computer generated, do not hold such statistics. Here we make the first attempt to learn the statistics of SCIs, based upon which the quality of SCIs can be effectively determined. The underlying mechanism of the proposed approach is based upon the mild assumption that the SCIs, which are not physically acquired, still obey certain statistics that could be understood in a learning fashion. We empirically show that the statistics deviation could be effectively leveraged in quality assessment, and the proposed method is superior when evaluated in different settings. Extensive experimental results demonstrate the Deep Feature Statistics based SCI Quality Assessment (DFSS-IQA) model delivers promising performance compared with existing NR-IQA models and shows a high generalization capability in the cross-dataset settings. The implementation of our method is publicly available at https://github.com/Baoliang93/DFSS-IQA.

Deep Feature Statistics Mapping for Generalized Screen Content Image Quality Assessment

TL;DR

This work tackles the poor cross-dataset performance of SCI quality assessment by learning screen-content statistics directly in the deep feature space. It introduces DFSS-IQA, which uses triplet-based training to disentangle semantic-content from distortion cues and imposes a Gaussian distribution regularization on distortion features via Maximum Mean Discrepancy, enabling robust no-reference quality estimation. A distortion-type classifier and attention mechanism tie semantic information to distortion cues, producing a quality score regressed with MAE loss. Extensive cross- and intra-dataset experiments on SIQAD and SCID demonstrate superior generalization, with ablations and visualizations validating the effectiveness of the learned feature statistics and the disentangled representation.

Abstract

The statistical regularities of natural images, referred to as natural scene statistics, play an important role in no-reference image quality assessment. However, it has been widely acknowledged that screen content images (SCIs), which are typically computer generated, do not hold such statistics. Here we make the first attempt to learn the statistics of SCIs, based upon which the quality of SCIs can be effectively determined. The underlying mechanism of the proposed approach is based upon the mild assumption that the SCIs, which are not physically acquired, still obey certain statistics that could be understood in a learning fashion. We empirically show that the statistics deviation could be effectively leveraged in quality assessment, and the proposed method is superior when evaluated in different settings. Extensive experimental results demonstrate the Deep Feature Statistics based SCI Quality Assessment (DFSS-IQA) model delivers promising performance compared with existing NR-IQA models and shows a high generalization capability in the cross-dataset settings. The implementation of our method is publicly available at https://github.com/Baoliang93/DFSS-IQA.
Paper Structure (16 sections, 12 equations, 8 figures, 9 tables)

This paper contains 16 sections, 12 equations, 8 figures, 9 tables.

Figures (8)

  • Figure 1: Illustrations of the statistics of NIs and SCIs. (a) Distribution of Naturalness Values (DNV) of reference images in TID2013 dataset ponomarenko2015image; (b) DNV of reference images in SIQAD dataset yang2015perceptual; (c) The deep feature statistics of reference images in SIQAD dataset obtained by our proposed method.
  • Figure 2: Illustration of the framework of our proposed method. In the training phase, the images are grouped with a triplet including a reference image ($\boldsymbol{I^{r}}$), a distorted image ($\boldsymbol{I^{d}}$), and an auxiliary image ($\boldsymbol{I^{a}}$). In particular, $\boldsymbol{I^{d}}$ shares the same content with $\boldsymbol{I^{r}}$ while its quality is degraded by the distortion. $\boldsymbol{I^{a}}$ is sampled from pristine images but its content is different from $\boldsymbol{I^{r}}$. Then the quality-aware feature of each image is extracted via a multi-scale feature generator and further disentangled into a semantic-aware feature ($\boldsymbol{F^{rs}}$, $\boldsymbol{F^{ds}}$, $\boldsymbol{F^{as}}$) and a distortion-aware feature ($\boldsymbol{F^{rd}}$, $\boldsymbol{F^{dd}}$, $\boldsymbol{F^{ad}}$). We force the normalized distortion-aware feature to obey a unified distribution ($\boldsymbol{F^{gaus}}$) and treat the unified distribution as the feature statistics shared by the SCIs. As a consequence, the distortion of the $\boldsymbol{I^{d}}$ can be measured by the feature distribution divergence estimation. Finally, the quality of $\boldsymbol{I^{d}}$ can be regressed by incorporating both its semantic information ($\boldsymbol{F^{ds}}$) and distortion information ($\boldsymbol{F^{dd}}$). In the testing phase, only the testing image (without reference) is needed for quality prediction.
  • Figure 3: Distorted SCIs sampled from the SCID dataset. First row: Images distorted by the motion blur. Second row: Images distorted by the Gaussian noise. The images in the same row are degraded with the same distortion type and level while possessing different quality scores due to semantic variance.
  • Figure 4: Structure details of the multi-scale feature generator.
  • Figure 5: Study of the hyperparameter $\lambda_2$ under the cross-dataset setting: SIQAD $\rightarrow$ SCID.
  • ...and 3 more figures