You Only Train Once: A Unified Framework for Both Full-Reference and No-Reference Image Quality Assessment
Yi Ke Yun, Weisi Lin
TL;DR
The paper presents YOTO, a unified transformer-based framework that jointly handles FR and NR image quality assessment with a single training process. It combines a shared encoder with a Hierarchical Attention adaptor and a Semantic Distortion Aware module to model both spatial distortions and their semantic impact across encoder stages. Empirical results on multiple FR and NR benchmarks, including PIPAL, show state-of-the-art performance for FR and NR IQA, with joint FR/NR training further boosting NR quality estimates while maintaining FR performance. The approach offers improved consistency between FR and NR scores and holds promise for extending to multi-modal IQA scenarios in real-world applications.
Abstract
Although recent efforts in image quality assessment (IQA) have achieved promising performance, there still exists a considerable gap compared to the human visual system (HVS). One significant disparity lies in humans' seamless transition between full reference (FR) and no reference (NR) tasks, whereas existing models are constrained to either FR or NR tasks. This disparity implies the necessity of designing two distinct systems, thereby greatly diminishing the model's versatility. Therefore, our focus lies in unifying FR and NR IQA under a single framework. Specifically, we first employ an encoder to extract multi-level features from input images. Then a Hierarchical Attention (HA) module is proposed as a universal adapter for both FR and NR inputs to model the spatial distortion at each encoder stage. Furthermore, considering that different distortions contaminate encoder stages and damage image semantic meaning differently, a Semantic Distortion Aware (SDA) module is proposed to examine feature correlations between shallow and deep layers of the encoder. By adopting HA and SDA, the proposed network can effectively perform both FR and NR IQA. When our proposed model is independently trained on NR or FR IQA tasks, it outperforms existing models and achieves state-of-the-art performance. Moreover, when trained jointly on NR and FR IQA tasks, it further enhances the performance of NR IQA while achieving on-par performance in the state-of-the-art FR IQA. You only train once to perform both IQA tasks. Code will be released at: https://github.com/BarCodeReader/YOTO.
