Table of Contents
Fetching ...

PosterIQ: A Design Perspective Benchmark for Poster Understanding and Generation

Yuheng Feng, Wen Zhang, Haodong Duan, Xingxing Zou

Abstract

We present PosterIQ, a design-driven benchmark for poster understanding and generation, annotated across composition structure, typographic hierarchy, and semantic intent. It includes 7,765 image-annotation instances and 822 generation prompts spanning real, professional, and synthetic cases. To bridge visual design cognition and generative modeling, we define tasks for layout parsing, text-image correspondence, typography/readability and font perception, design quality assessment, and controllable, composition-aware generation with metaphor. We evaluate state-of-the-art MLLMs and diffusion-based generators, finding persistent gaps in visual hierarchy, typographic semantics, saliency control, and intention communication; commercial models lead on high-level reasoning but act as insensitive automatic raters, while generators render text well yet struggle with composition-aware synthesis. Extensive analyses show PosterIQ is both a quantitative benchmark and a diagnostic tool for design reasoning, offering reproducible, task-specific metrics. We aim to catalyze models' creativity and integrate human-centred design principles into generative vision-language systems.

PosterIQ: A Design Perspective Benchmark for Poster Understanding and Generation

Abstract

We present PosterIQ, a design-driven benchmark for poster understanding and generation, annotated across composition structure, typographic hierarchy, and semantic intent. It includes 7,765 image-annotation instances and 822 generation prompts spanning real, professional, and synthetic cases. To bridge visual design cognition and generative modeling, we define tasks for layout parsing, text-image correspondence, typography/readability and font perception, design quality assessment, and controllable, composition-aware generation with metaphor. We evaluate state-of-the-art MLLMs and diffusion-based generators, finding persistent gaps in visual hierarchy, typographic semantics, saliency control, and intention communication; commercial models lead on high-level reasoning but act as insensitive automatic raters, while generators render text well yet struggle with composition-aware synthesis. Extensive analyses show PosterIQ is both a quantitative benchmark and a diagnostic tool for design reasoning, offering reproducible, task-specific metrics. We aim to catalyze models' creativity and integrate human-centred design principles into generative vision-language systems.
Paper Structure (17 sections, 15 equations, 28 figures, 7 tables)

This paper contains 17 sections, 15 equations, 28 figures, 7 tables.

Figures (28)

  • Figure 1: Overview of the benchmark, which includes over a dozen tasks
  • Figure 2: Qualitative comparison of four models on three layout-related tasks. For Text Localization and Layout Generation, the predicted bounding boxes are shown in red. For the Empty Space task, the selected patch IDs are highlighted in the image.
  • Figure 3: Qualitative comparison of four models on five generation tasks.
  • Figure 4: Qualitative comparison of model outputs over supervision-guided iterations.
  • Figure 5: Benchmark statistics for understanding tasks (top) and generation tasks (bottom).
  • ...and 23 more figures