A Lightweight Parallel Framework for Blind Image Quality Assessment
Qunyue Huang, Bin Fang
TL;DR
The paper addresses BIQA with a lightweight, end-to-end framework (LPF) that reduces model complexity and training time while maintaining, or surpassing, state-of-the-art accuracy. It couples a fixed pre-trained encoder (VGG16) with a compact Feature Embedding Network (FEN), two self-supervised auxiliary tasks (category prediction and quality comparison), and a Distortion-aware Quality Regression Network (DaQRN) to predict MOS scores. The method is trained with three losses that enforce coarse distortion perception, robust latent representations, and accurate score prediction, and testing only requires the FEN and DaQRN, excluding the auxiliary nets. Empirical results on eight benchmark datasets show superior performance and lower computational cost, with strong cross-dataset generalization and fast convergence. This approach enables efficient BIQA suitable for resource-constrained settings while retaining high perceptual accuracy.
Abstract
Existing blind image quality assessment (BIQA) methods focus on designing complicated networks based on convolutional neural networks (CNNs) or transformer. In addition, some BIQA methods enhance the performance of the model in a two-stage training manner. Despite the significant advancements, these methods remarkably raise the parameter count of the model, thus requiring more training time and computational resources. To tackle the above issues, we propose a lightweight parallel framework (LPF) for BIQA. First, we extract the visual features using a pre-trained feature extraction network. Furthermore, we construct a simple yet effective feature embedding network (FEN) to transform the visual features, aiming to generate the latent representations that contain salient distortion information. To improve the robustness of the latent representations, we present two novel self-supervised subtasks, including a sample-level category prediction task and a batch-level quality comparison task. The sample-level category prediction task is presented to help the model with coarse-grained distortion perception. The batch-level quality comparison task is formulated to enhance the training data and thus improve the robustness of the latent representations. Finally, the latent representations are fed into a distortion-aware quality regression network (DaQRN), which simulates the human vision system (HVS) and thus generates accurate quality scores. Experimental results on multiple benchmark datasets demonstrate that the proposed method achieves superior performance over state-of-the-art approaches. Moreover, extensive analyses prove that the proposed method has lower computational complexity and faster convergence speed.
