Table of Contents
Fetching ...

Can GPTs Evaluate Graphic Design Based on Design Principles?

Daichi Haraguchi, Naoto Inoue, Wataru Shimoda, Hayato Mitani, Seiichi Uchida, Kota Yamaguchi

TL;DR

This paper compares the behavior of GPT-based evaluation and heuristic evaluation based on design principles using human annotations collected from 60 subjects and reveals that, while GPTs cannot distinguish small details, they have a reasonably good correlation with human annotation and exhibit a similar tendency to heuristic metrics based on design principles.

Abstract

Recent advancements in foundation models show promising capability in graphic design generation. Several studies have started employing Large Multimodal Models (LMMs) to evaluate graphic designs, assuming that LMMs can properly assess their quality, but it is unclear if the evaluation is reliable. One way to evaluate the quality of graphic design is to assess whether the design adheres to fundamental graphic design principles, which are the designer's common practice. In this paper, we compare the behavior of GPT-based evaluation and heuristic evaluation based on design principles using human annotations collected from 60 subjects. Our experiments reveal that, while GPTs cannot distinguish small details, they have a reasonably good correlation with human annotation and exhibit a similar tendency to heuristic metrics based on design principles, suggesting that they are indeed capable of assessing the quality of graphic design. Our dataset is available at https://cyberagentailab.github.io/Graphic-design-evaluation .

Can GPTs Evaluate Graphic Design Based on Design Principles?

TL;DR

This paper compares the behavior of GPT-based evaluation and heuristic evaluation based on design principles using human annotations collected from 60 subjects and reveals that, while GPTs cannot distinguish small details, they have a reasonably good correlation with human annotation and exhibit a similar tendency to heuristic metrics based on design principles.

Abstract

Recent advancements in foundation models show promising capability in graphic design generation. Several studies have started employing Large Multimodal Models (LMMs) to evaluate graphic designs, assuming that LMMs can properly assess their quality, but it is unclear if the evaluation is reliable. One way to evaluate the quality of graphic design is to assess whether the design adheres to fundamental graphic design principles, which are the designer's common practice. In this paper, we compare the behavior of GPT-based evaluation and heuristic evaluation based on design principles using human annotations collected from 60 subjects. Our experiments reveal that, while GPTs cannot distinguish small details, they have a reasonably good correlation with human annotation and exhibit a similar tendency to heuristic metrics based on design principles, suggesting that they are indeed capable of assessing the quality of graphic design. Our dataset is available at https://cyberagentailab.github.io/Graphic-design-evaluation .

Paper Structure

This paper contains 20 sections, 10 figures.

Figures (10)

  • Figure 1: Negative examples of three design principles.
  • Figure 2: How to rate graphic designs by GPT and humans. The detailed input prompts for the GPT is described in the Appendix.
  • Figure 3: Correlation between human annotation and heuristic or GPT scores. The $r$ is the Pearson correlation coefficient.
  • Figure 4: Graphic designs with their alignment scores.
  • Figure 5: Correlation coefficient of the scores between human evaluation and each method.
  • ...and 5 more figures