Graphic Design with Large Multimodal Model

Yutao Cheng; Zhao Zhang; Maoke Yang; Hui Nie; Chunyuan Li; Xinglong Wu; Jie Shao

Graphic Design with Large Multimodal Model

Yutao Cheng, Zhao Zhang, Maoke Yang, Hui Nie, Chunyuan Li, Xinglong Wu, Jie Shao

TL;DR

This paper introduces Hierarchical Layout Generation (HLG) to relax the predefined layer ordering in Graphic Layout Generation (GLG) and enable cohesive designs from unordered design elements. It proposes Graphist, the first end-to-end, large multimodal model-based layout generator that ingests RGB-A inputs and outputs a JSON protocol describing element coordinates, sizes, and hierarchy. To evaluate HLG, the authors develop Inverse Order Pair Ratio (IOPR) and GPT-4V Eval, demonstrating state-of-the-art performance on GLG and HLG tasks across Crello and CGL-V2 datasets, along with real-world and ablation studies. The work highlights Graphist’s potential to democratize graphic design by enabling flexible, automated composition while outlining limitations and avenues for reducing design homogeneity and environmental impact.

Abstract

In the field of graphic design, automating the integration of design elements into a cohesive multi-layered artwork not only boosts productivity but also paves the way for the democratization of graphic design. One existing practice is Graphic Layout Generation (GLG), which aims to layout sequential design elements. It has been constrained by the necessity for a predefined correct sequence of layers, thus limiting creative potential and increasing user workload. In this paper, we present Hierarchical Layout Generation (HLG) as a more flexible and pragmatic setup, which creates graphic composition from unordered sets of design elements. To tackle the HLG task, we introduce Graphist, the first layout generation model based on large multimodal models. Graphist efficiently reframes the HLG as a sequence generation problem, utilizing RGB-A images as input, outputs a JSON draft protocol, indicating the coordinates, size, and order of each element. We develop new evaluation metrics for HLG. Graphist outperforms prior arts and establishes a strong baseline for this field. Project homepage: https://github.com/graphic-design-ai/graphist

Graphic Design with Large Multimodal Model

TL;DR

Abstract

Paper Structure (27 sections, 1 equation, 6 figures, 7 tables)

This paper contains 27 sections, 1 equation, 6 figures, 7 tables.

Introduction
Related Work
Graphic Layout Generation
Large Multimodal Models
Task Formulation
Graphic Layout Generation
Hierarchical Layout Generation
Proposed Method
Graphist Architecture
Training Strategy
Experiment
Datasets
Crello. yamaguchi2021crello
Evaluation Metrics
Inverse order pair ratio (IOPR).
...and 12 more sections

Figures (6)

Figure 1: Schematic diagram of hierarchical layout generation. (Left) A comparison between the traditional GLG task and the newly proposed HLG task, with the major difference in that HLG relaxes the constraint of GLG, so that unordered multimodal input elements can be processed. (Right) Errors in either layer sequencing or spatial positioning can significantly impact the overall quality of the design.
Figure 2: Graphist Pipeline. Graphist comprises three components: RGB-A Encoder, Visual Shrinker, and a LLM. It accepts a variety of design elements and generates a graphic composition in JSON format end-to-end.
Figure 3: A user-generated case via graphist web demo. The top-left figure represents the input design elements to Graphist. Below it, we present the corresponding output JSON code generated by Graphist. The final two images in the top row illustrate the visualized results: first is the layout visualization, and the second is the graphic composition by putting these elements according to the JSON protocol. Additional examples are available in Figure \ref{['fig:user']}.
Figure 4: Results visualization of the GLG task on the Crello dataset. The results for Flex-DM were derived from their open-source code, whereas the results for GPT-4V and Gemini-1.5-Pro are obtained in zero-shot manner.
Figure 5: Comparison with SoTA method on the HLG task with real-world design elements. Both Gemini-1.5-Pro and Graphist models are tasked with HLG using identical real-world design elements. The outcomes indicate superior design quality achieved by our Graphist* when compared with Gemini-1.5-Pro.
...and 1 more figures

Graphic Design with Large Multimodal Model

TL;DR

Abstract

Graphic Design with Large Multimodal Model

Authors

TL;DR

Abstract

Table of Contents

Figures (6)