Table of Contents
Fetching ...

Sliced Maximal Information Coefficient: A Training-Free Approach for Image Quality Assessment Enhancement

Kang Xiao, Xu Wang, Yulin He, Baoliang Chen, Xuelin Shen

TL;DR

This work tackles misalignment between FR-IQA predictions and human perception by introducing a training-free attention mechanism, SMIC, that estimates inter-image statistical dependency in deep feature space. SMIC uses sliced mutual information with random projections to generate attention maps that weight local distortion cues from existing IQA measures, improving their alignment with human judgments. The approach is validated across six IQA datasets and multiple baselines (including GAN- and SR-based distortions), demonstrating robust, model-agnostic gains without any training. The method is practical and interpretable, offering a scalable path to more MOS-consistent image quality assessment.

Abstract

Full-reference image quality assessment (FR-IQA) models generally operate by measuring the visual differences between a degraded image and its reference. However, existing FR-IQA models including both the classical ones (eg, PSNR and SSIM) and deep-learning based measures (eg, LPIPS and DISTS) still exhibit limitations in capturing the full perception characteristics of the human visual system (HVS). In this paper, instead of designing a new FR-IQA measure, we aim to explore a generalized human visual attention estimation strategy to mimic the process of human quality rating and enhance existing IQA models. In particular, we model human attention generation by measuring the statistical dependency between the degraded image and the reference image. The dependency is captured in a training-free manner by our proposed sliced maximal information coefficient and exhibits surprising generalization in different IQA measures. Experimental results verify the performance of existing IQA models can be consistently improved when our attention module is incorporated. The source code is available at https://github.com/KANGX99/SMIC.

Sliced Maximal Information Coefficient: A Training-Free Approach for Image Quality Assessment Enhancement

TL;DR

This work tackles misalignment between FR-IQA predictions and human perception by introducing a training-free attention mechanism, SMIC, that estimates inter-image statistical dependency in deep feature space. SMIC uses sliced mutual information with random projections to generate attention maps that weight local distortion cues from existing IQA measures, improving their alignment with human judgments. The approach is validated across six IQA datasets and multiple baselines (including GAN- and SR-based distortions), demonstrating robust, model-agnostic gains without any training. The method is practical and interpretable, offering a scalable path to more MOS-consistent image quality assessment.

Abstract

Full-reference image quality assessment (FR-IQA) models generally operate by measuring the visual differences between a degraded image and its reference. However, existing FR-IQA models including both the classical ones (eg, PSNR and SSIM) and deep-learning based measures (eg, LPIPS and DISTS) still exhibit limitations in capturing the full perception characteristics of the human visual system (HVS). In this paper, instead of designing a new FR-IQA measure, we aim to explore a generalized human visual attention estimation strategy to mimic the process of human quality rating and enhance existing IQA models. In particular, we model human attention generation by measuring the statistical dependency between the degraded image and the reference image. The dependency is captured in a training-free manner by our proposed sliced maximal information coefficient and exhibits surprising generalization in different IQA measures. Experimental results verify the performance of existing IQA models can be consistently improved when our attention module is incorporated. The source code is available at https://github.com/KANGX99/SMIC.
Paper Structure (12 sections, 8 equations, 5 figures, 2 tables)

This paper contains 12 sections, 8 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Existing FR-IQA models can be enhanced by our attention modeling. Images in the first and third columns are different distorted images. Humans prefer the quality of the image in the third column, which is opposite to the prediction results of PSNR and SSIM (1st row), LPIPS and DISTS (2nd row). By incorporating the attention maps (shown in the second column), the enhanced FR-IQA models provide more consistent judgments with humans. We estimate human attention by the proposed Sliced Maximal Information Coefficient (SMIC) without any training process.
  • Figure 2: A toy example to illustrate the calculation of MIC between $X$ and $Y$.
  • Figure 3: The framework of our enhancement strategy for existing FR-IQA models.
  • Figure 4: Illustration of the attention map generation by our proposed SMIC.
  • Figure 5: Visualisation of the generated attention maps. $I^r$ and $I^d$ are the reference and distorted images. $M^d$ represents the distortion maps generated by PSNR (1st row) and SSIM (2nd row). $M^d_s$ (s = 3,4) are the distortion maps measured by LPIPS (3rd row), DISTS (4th row), and DeepWSD (5th row) at the $s$-th stage of the VGG16 network. $M^a$ and $M^a_s$ are the estimated attention maps. $M^r$ and $M^r_s$ are the rectified distortion map with $M^r = M^d \otimes M^a$ and $M^r_s = M^d_s \otimes M^a_s$.