Table of Contents
Fetching ...

HarmonyIQA: Pioneering Benchmark and Model for Image Harmonization Quality Assessment

Zitong Xu, Huiyu Duan, Guangji Ma, Liu Yang, Jiarui Wang, Qingbo Wu, Xiongkuo Min, Guangtao Zhai, Patrick Le Callet

TL;DR

The paper tackles the misalignment between traditional image quality assessment and human perception in image harmonization by introducing HarmonyIQAD, the first dedicated harmony-quality database with 1,350 harmonized images and 28,350 subjective ratings. It then proposes HarmonyIQA, a large multimodal evaluator that fuses visual features from a vision encoder with user prompts via a pre-trained LLM, enhanced by instruction tuning and LoRA in a double-stage training regime. Empirical results show HarmonyIQA achieves state-of-the-art performance on HarmonyIQAD and competitive results on standard IQA benchmarks, with superior cross-dataset generalization relative to self-supervised baselines. The work provides publicly available resources to advance evaluation and development of both NGIHAs and GIHAs in image harmonization, with practical implications for perceptually aligned IHAs.

Abstract

Image composition involves extracting a foreground object from one image and pasting it into another image through Image harmonization algorithms (IHAs), which aim to adjust the appearance of the foreground object to better match the background. Existing image quality assessment (IQA) methods may fail to align with human visual preference on image harmonization due to the insensitivity to minor color or light inconsistency. To address the issue and facilitate the advancement of IHAs, we introduce the first Image Quality Assessment Database for image Harmony evaluation (HarmonyIQAD), which consists of 1,350 harmonized images generated by 9 different IHAs, and the corresponding human visual preference scores. Based on this database, we propose a Harmony Image Quality Assessment (HarmonyIQA), to predict human visual preference for harmonized images. Extensive experiments show that HarmonyIQA achieves state-of-the-art performance on human visual preference evaluation for harmonized images, and also achieves competing results on traditional IQA tasks. Furthermore, cross-dataset evaluation also shows that HarmonyIQA exhibits better generalization ability than self-supervised learning-based IQA methods. Both HarmonyIQAD and HarmonyIQA will be made publicly available upon paper publication.

HarmonyIQA: Pioneering Benchmark and Model for Image Harmonization Quality Assessment

TL;DR

The paper tackles the misalignment between traditional image quality assessment and human perception in image harmonization by introducing HarmonyIQAD, the first dedicated harmony-quality database with 1,350 harmonized images and 28,350 subjective ratings. It then proposes HarmonyIQA, a large multimodal evaluator that fuses visual features from a vision encoder with user prompts via a pre-trained LLM, enhanced by instruction tuning and LoRA in a double-stage training regime. Empirical results show HarmonyIQA achieves state-of-the-art performance on HarmonyIQAD and competitive results on standard IQA benchmarks, with superior cross-dataset generalization relative to self-supervised baselines. The work provides publicly available resources to advance evaluation and development of both NGIHAs and GIHAs in image harmonization, with practical implications for perceptually aligned IHAs.

Abstract

Image composition involves extracting a foreground object from one image and pasting it into another image through Image harmonization algorithms (IHAs), which aim to adjust the appearance of the foreground object to better match the background. Existing image quality assessment (IQA) methods may fail to align with human visual preference on image harmonization due to the insensitivity to minor color or light inconsistency. To address the issue and facilitate the advancement of IHAs, we introduce the first Image Quality Assessment Database for image Harmony evaluation (HarmonyIQAD), which consists of 1,350 harmonized images generated by 9 different IHAs, and the corresponding human visual preference scores. Based on this database, we propose a Harmony Image Quality Assessment (HarmonyIQA), to predict human visual preference for harmonized images. Extensive experiments show that HarmonyIQA achieves state-of-the-art performance on human visual preference evaluation for harmonized images, and also achieves competing results on traditional IQA tasks. Furthermore, cross-dataset evaluation also shows that HarmonyIQA exhibits better generalization ability than self-supervised learning-based IQA methods. Both HarmonyIQAD and HarmonyIQA will be made publicly available upon paper publication.
Paper Structure (14 sections, 2 equations, 4 figures, 6 tables)

This paper contains 14 sections, 2 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Example of a composite image and the harmonization process.
  • Figure 2: An overview of the content and rating GUI of HarmonyIQAD. (a) Example images from our database, which contains reference images, composite images and harmonization-processed images. (b) The illustration of GUI for subjective rating.
  • Figure 3: (a) MOSs distribution in HarmonyIQAD. (b) Mean and standard deviation of the MOSs for each IHAs.
  • Figure 4: The HarmonyIQA model consists of two encoders: a visual encoder for extracting image features and a text encoder for processing user prompt features. These features are aligned by a trainable projector and passed into a pre-trained LLM, from which the last hidden states are selected. In the first training stage, these hidden states are decoded through a text decoder, with the text labels and cross-entropy loss used for training. In the second training stage, the hidden state representing the token just before the score is decoded through a quality score decoder, with the score number and mean squared error loss used for training. LoRA weights are introduced to the vision decoder and LLM to adapt the model for the quality assessment task.