Table of Contents
Fetching ...

Contrastive Local Manifold Learning for No-Reference Image Quality Assessment

Zihao Huang, Runze Hu, Timin Gao, Yan Zhang, Yunhang Shen, Ke Li

TL;DR

The paper tackles the challenge of no-reference image quality assessment by preserving local perceptual structure through local manifold learning. It introduces LML-IQA, a contrastive NR-IQA framework that uses visually salient patches as positives, non-salient regions as intra-class negatives, and cross-image crops as inter-class negatives, all within a teacher-student mutual learning setup with EMA stabilization. The method jointly learns discriminative local manifolds and predicts quality scores, validated on eight NR-IQA benchmarks where it achieves state-of-the-art or competitive results and demonstrates strong cross-dataset generalization and data-efficient learning. Visualizations reveal improved attention to distortion regions and clearer quality feature separations, underscoring the practical impact for robust perceptual quality assessment. Overall, LML-IQA advances NR-IQA by aligning learning with human attention to salient regions and preserving local manifold structure under a principled contrastive objective.

Abstract

Image Quality Assessment (IQA) methods typically overlook local manifold structures, leading to compromised discriminative capabilities in perceptual quality evaluation. To address this limitation, we present LML-IQA, an innovative no-reference IQA (NR-IQA) approach that leverages a combination of local manifold learning and contrastive learning. Our approach first extracts multiple patches from each image and identifies the most visually salient region. This salient patch serves as a positive sample for contrastive learning, while other patches from the same image are treated as intra-class negatives to preserve local distinctiveness. Patches from different images also act as inter-class negatives to enhance feature separation. Additionally, we introduce a mutual learning strategy to improve the model's ability to recognize and prioritize visually important regions. Comprehensive experiments across eight benchmark datasets demonstrate significant performance gains over state-of-the-art methods, achieving a PLCC of 0.942 on TID2013 (compared to 0.908) and 0.977 on CSIQ (compared to 0.965).

Contrastive Local Manifold Learning for No-Reference Image Quality Assessment

TL;DR

The paper tackles the challenge of no-reference image quality assessment by preserving local perceptual structure through local manifold learning. It introduces LML-IQA, a contrastive NR-IQA framework that uses visually salient patches as positives, non-salient regions as intra-class negatives, and cross-image crops as inter-class negatives, all within a teacher-student mutual learning setup with EMA stabilization. The method jointly learns discriminative local manifolds and predicts quality scores, validated on eight NR-IQA benchmarks where it achieves state-of-the-art or competitive results and demonstrates strong cross-dataset generalization and data-efficient learning. Visualizations reveal improved attention to distortion regions and clearer quality feature separations, underscoring the practical impact for robust perceptual quality assessment. Overall, LML-IQA advances NR-IQA by aligning learning with human attention to salient regions and preserving local manifold structure under a principled contrastive objective.

Abstract

Image Quality Assessment (IQA) methods typically overlook local manifold structures, leading to compromised discriminative capabilities in perceptual quality evaluation. To address this limitation, we present LML-IQA, an innovative no-reference IQA (NR-IQA) approach that leverages a combination of local manifold learning and contrastive learning. Our approach first extracts multiple patches from each image and identifies the most visually salient region. This salient patch serves as a positive sample for contrastive learning, while other patches from the same image are treated as intra-class negatives to preserve local distinctiveness. Patches from different images also act as inter-class negatives to enhance feature separation. Additionally, we introduce a mutual learning strategy to improve the model's ability to recognize and prioritize visually important regions. Comprehensive experiments across eight benchmark datasets demonstrate significant performance gains over state-of-the-art methods, achieving a PLCC of 0.942 on TID2013 (compared to 0.908) and 0.977 on CSIQ (compared to 0.965).
Paper Structure (25 sections, 10 equations, 5 figures, 6 tables)

This paper contains 25 sections, 10 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Illustration of contrastive learning paradigms in NR-IQA, with MOS values represented by circle color. (a) Prior IQA methods using contrastive learning aim for feature convergence among all crops of an image. The consequence is a failure to retain the local manifold, illustrated by the progressively smaller feature distances inside the dashed box. (b) Our framework is designed to preserve the local manifold, thereby upholding a diverse feature space and maintaining consistent feature distances.
  • Figure 2: Overview of the LML-IQA framework. Given an input image, the teacher model first identifies the most salient region using visual saliency detection. Multiple random crops are generated, with the salient region as the positive sample, random crops from the same image as intra-class negatives, and crops from other images as inter-class negatives. These samples are processed by the student encoder for feature extraction and contrastive learning, followed by quality score prediction with the student decoder. Teacher-student mutual learning is enabled via an exponential moving average (EMA) mechanism.
  • Figure 3: gMAD competition results between DEIQT DEIQT and LML-IQA. The first two columns represent LML-IQA as the attacker and DEIQT as the defender, while the roles are reversed in the last two columns. Each row, from top to bottom, fixs the defender at different quality level constant, ranging from low to high. The numerical values below each image indicate the attacker's perceived quality score.
  • Figure 4: Comparison of Grad-CAM activation maps between the baseline DEIQT and LML-IQA. For each example, the figure displays three rows: (1) the input image, (2) the CAM from the baseline, and (3) the CAM from LML-IQA. The numerical values provided below each column correspond to the Ground Truth score, the baseline's prediction, and our model's prediction, respectively.
  • Figure 5: The t-SNE visualization of the quality features of the LIVEC training set learned by our LML-IQA, with different colors representing different quality score ranges.