Contrastive Local Manifold Learning for No-Reference Image Quality Assessment
Zihao Huang, Runze Hu, Timin Gao, Yan Zhang, Yunhang Shen, Ke Li
TL;DR
The paper tackles the challenge of no-reference image quality assessment by preserving local perceptual structure through local manifold learning. It introduces LML-IQA, a contrastive NR-IQA framework that uses visually salient patches as positives, non-salient regions as intra-class negatives, and cross-image crops as inter-class negatives, all within a teacher-student mutual learning setup with EMA stabilization. The method jointly learns discriminative local manifolds and predicts quality scores, validated on eight NR-IQA benchmarks where it achieves state-of-the-art or competitive results and demonstrates strong cross-dataset generalization and data-efficient learning. Visualizations reveal improved attention to distortion regions and clearer quality feature separations, underscoring the practical impact for robust perceptual quality assessment. Overall, LML-IQA advances NR-IQA by aligning learning with human attention to salient regions and preserving local manifold structure under a principled contrastive objective.
Abstract
Image Quality Assessment (IQA) methods typically overlook local manifold structures, leading to compromised discriminative capabilities in perceptual quality evaluation. To address this limitation, we present LML-IQA, an innovative no-reference IQA (NR-IQA) approach that leverages a combination of local manifold learning and contrastive learning. Our approach first extracts multiple patches from each image and identifies the most visually salient region. This salient patch serves as a positive sample for contrastive learning, while other patches from the same image are treated as intra-class negatives to preserve local distinctiveness. Patches from different images also act as inter-class negatives to enhance feature separation. Additionally, we introduce a mutual learning strategy to improve the model's ability to recognize and prioritize visually important regions. Comprehensive experiments across eight benchmark datasets demonstrate significant performance gains over state-of-the-art methods, achieving a PLCC of 0.942 on TID2013 (compared to 0.908) and 0.977 on CSIQ (compared to 0.965).
