Table of Contents
Fetching ...

Dog-IQA: Standard-guided Zero-shot MLLM for Mix-grained Image Quality Assessment

Kai Liu, Ziqing Zhang, Wenbo Li, Renjing Pei, Fenglong Song, Xiaohong Liu, Linghe Kong, Yulun Zhang

TL;DR

Dog-IQA, a standard-guided zero-shot mix-grained IQA method, which is training-free and utilizes the exceptional prior knowledge of multimodal large language models (MLLMs) to obtain accurate IQA scores, is proposed.

Abstract

Image quality assessment (IQA) serves as the golden standard for all models' performance in nearly all computer vision fields. However, it still suffers from poor out-of-distribution generalization ability and expensive training costs. To address these problems, we propose Dog-IQA, a standard-guided zero-shot mix-grained IQA method, which is training-free and utilizes the exceptional prior knowledge of multimodal large language models (MLLMs). To obtain accurate IQA scores, namely scores consistent with humans, we design an MLLM-based inference pipeline that imitates human experts. In detail, Dog-IQA applies two techniques. First, Dog-IQA objectively scores with specific standards that utilize MLLM's behavior pattern and minimize the influence of subjective factors. Second, Dog-IQA comprehensively takes local semantic objects and the whole image as input and aggregates their scores, leveraging local and global information. Our proposed Dog-IQA achieves state-of-the-art (SOTA) performance compared with training-free methods, and competitive performance compared with training-based methods in cross-dataset scenarios. Our code will be available at https://github.com/Kai-Liu001/Dog-IQA.

Dog-IQA: Standard-guided Zero-shot MLLM for Mix-grained Image Quality Assessment

TL;DR

Dog-IQA, a standard-guided zero-shot mix-grained IQA method, which is training-free and utilizes the exceptional prior knowledge of multimodal large language models (MLLMs) to obtain accurate IQA scores, is proposed.

Abstract

Image quality assessment (IQA) serves as the golden standard for all models' performance in nearly all computer vision fields. However, it still suffers from poor out-of-distribution generalization ability and expensive training costs. To address these problems, we propose Dog-IQA, a standard-guided zero-shot mix-grained IQA method, which is training-free and utilizes the exceptional prior knowledge of multimodal large language models (MLLMs). To obtain accurate IQA scores, namely scores consistent with humans, we design an MLLM-based inference pipeline that imitates human experts. In detail, Dog-IQA applies two techniques. First, Dog-IQA objectively scores with specific standards that utilize MLLM's behavior pattern and minimize the influence of subjective factors. Second, Dog-IQA comprehensively takes local semantic objects and the whole image as input and aggregates their scores, leveraging local and global information. Our proposed Dog-IQA achieves state-of-the-art (SOTA) performance compared with training-free methods, and competitive performance compared with training-based methods in cross-dataset scenarios. Our code will be available at https://github.com/Kai-Liu001/Dog-IQA.
Paper Structure (12 sections, 1 equation, 6 figures, 6 tables, 1 algorithm)

This paper contains 12 sections, 1 equation, 6 figures, 6 tables, 1 algorithm.

Figures (6)

  • Figure 1: Comparison between Dog-IQA and existing training-free IQA SOTAs, exhibiting Dog-IQA's excellent zero-shot IQA ability.
  • Figure 2: The idea of Dog-IQA is inspired by the human evaluator's scoring procedures. When scoring, human evaluators are provided with standards mapping the quality to scores. Then they start with the global quality and zoom in on objects to grasp local quality. We integrate these key procedures and switch their form according to MLLM's behavior pattern, formulating Dog-IQA.
  • Figure 3: The overall pipeline for our proposed Dog-IQA. It can be divided into three stages, i.e., segmentation, standard guided scoring, and score aggregation. The input image is segmented into multiple sub-images centered on objects. Then, MLLM scores with quality standards. After the area-weighted average and addition with $s_{seg}$, the scores are aggregated as the final quality score.
  • Figure 4: Correlation between MOS and Dog-IQA's scores on SPAQ and KonIQ. The marginal hist plots show the distribution of MOS and Dog-IQA's scores. And the points $(s^{*},s_{Dog})$ are scattered in the center. The regression line shows a linear correlation between Dog-IQA and human scores.
  • Figure 5: Example images with their segmented images. We select images with various scores to present the Dog-IQA's ability. The upper left number is the score while the lower right number is the area. The number of masks is shown in the upper left part in the segmented image.
  • ...and 1 more figures