Table of Contents
Fetching ...

A Lightweight Parallel Framework for Blind Image Quality Assessment

Qunyue Huang, Bin Fang

TL;DR

The paper addresses BIQA with a lightweight, end-to-end framework (LPF) that reduces model complexity and training time while maintaining, or surpassing, state-of-the-art accuracy. It couples a fixed pre-trained encoder (VGG16) with a compact Feature Embedding Network (FEN), two self-supervised auxiliary tasks (category prediction and quality comparison), and a Distortion-aware Quality Regression Network (DaQRN) to predict MOS scores. The method is trained with three losses that enforce coarse distortion perception, robust latent representations, and accurate score prediction, and testing only requires the FEN and DaQRN, excluding the auxiliary nets. Empirical results on eight benchmark datasets show superior performance and lower computational cost, with strong cross-dataset generalization and fast convergence. This approach enables efficient BIQA suitable for resource-constrained settings while retaining high perceptual accuracy.

Abstract

Existing blind image quality assessment (BIQA) methods focus on designing complicated networks based on convolutional neural networks (CNNs) or transformer. In addition, some BIQA methods enhance the performance of the model in a two-stage training manner. Despite the significant advancements, these methods remarkably raise the parameter count of the model, thus requiring more training time and computational resources. To tackle the above issues, we propose a lightweight parallel framework (LPF) for BIQA. First, we extract the visual features using a pre-trained feature extraction network. Furthermore, we construct a simple yet effective feature embedding network (FEN) to transform the visual features, aiming to generate the latent representations that contain salient distortion information. To improve the robustness of the latent representations, we present two novel self-supervised subtasks, including a sample-level category prediction task and a batch-level quality comparison task. The sample-level category prediction task is presented to help the model with coarse-grained distortion perception. The batch-level quality comparison task is formulated to enhance the training data and thus improve the robustness of the latent representations. Finally, the latent representations are fed into a distortion-aware quality regression network (DaQRN), which simulates the human vision system (HVS) and thus generates accurate quality scores. Experimental results on multiple benchmark datasets demonstrate that the proposed method achieves superior performance over state-of-the-art approaches. Moreover, extensive analyses prove that the proposed method has lower computational complexity and faster convergence speed.

A Lightweight Parallel Framework for Blind Image Quality Assessment

TL;DR

The paper addresses BIQA with a lightweight, end-to-end framework (LPF) that reduces model complexity and training time while maintaining, or surpassing, state-of-the-art accuracy. It couples a fixed pre-trained encoder (VGG16) with a compact Feature Embedding Network (FEN), two self-supervised auxiliary tasks (category prediction and quality comparison), and a Distortion-aware Quality Regression Network (DaQRN) to predict MOS scores. The method is trained with three losses that enforce coarse distortion perception, robust latent representations, and accurate score prediction, and testing only requires the FEN and DaQRN, excluding the auxiliary nets. Empirical results on eight benchmark datasets show superior performance and lower computational cost, with strong cross-dataset generalization and fast convergence. This approach enables efficient BIQA suitable for resource-constrained settings while retaining high perceptual accuracy.

Abstract

Existing blind image quality assessment (BIQA) methods focus on designing complicated networks based on convolutional neural networks (CNNs) or transformer. In addition, some BIQA methods enhance the performance of the model in a two-stage training manner. Despite the significant advancements, these methods remarkably raise the parameter count of the model, thus requiring more training time and computational resources. To tackle the above issues, we propose a lightweight parallel framework (LPF) for BIQA. First, we extract the visual features using a pre-trained feature extraction network. Furthermore, we construct a simple yet effective feature embedding network (FEN) to transform the visual features, aiming to generate the latent representations that contain salient distortion information. To improve the robustness of the latent representations, we present two novel self-supervised subtasks, including a sample-level category prediction task and a batch-level quality comparison task. The sample-level category prediction task is presented to help the model with coarse-grained distortion perception. The batch-level quality comparison task is formulated to enhance the training data and thus improve the robustness of the latent representations. Finally, the latent representations are fed into a distortion-aware quality regression network (DaQRN), which simulates the human vision system (HVS) and thus generates accurate quality scores. Experimental results on multiple benchmark datasets demonstrate that the proposed method achieves superior performance over state-of-the-art approaches. Moreover, extensive analyses prove that the proposed method has lower computational complexity and faster convergence speed.
Paper Structure (31 sections, 12 equations, 13 figures, 7 tables, 1 algorithm)

This paper contains 31 sections, 12 equations, 13 figures, 7 tables, 1 algorithm.

Figures (13)

  • Figure 1: Comparing our method with prior approaches for BIQA. Prior works focused on designing complicated networks using CNNs or transformer. Moreover, two-stage methods were proposed to significantly improve the robustness of the models. With a lightweight network structure and a parallel training strategy, our method accelerated the convergence speed with lower computational complexity while obtaining superior performance.
  • Figure 2: The network structure of the proposed method. The BIQA task is formulated into three parallel subtasks, including a quality regression task and two self-supervised auxiliary tasks consisting of a category prediction task and a quality comparison task.
  • Figure 3: The generation procedure of the category label, where $Q_1$, $Q_2$, and $Q_3$ denote the first quartile, second quartile, and third quartile.
  • Figure 4: Ground truth scores for each dataset after data preprocessing, where the first quartile, second quartile, and third quartile are marked, respectively.
  • Figure 5: Illustration of some distorted images with MOS and category labels generated by the proposed method.
  • ...and 8 more figures