Table of Contents
Fetching ...

From Elements to Design: A Layered Approach for Automatic Graphic Design Composition

Jiawei Lin, Shizhao Sun, Danqing Huang, Ting Liu, Ji Li, Jiang Bian

TL;DR

LaDeCo addresses the challenge of automatic holistic graphic design composition by introducing a layered design principle that organizes input multimodal elements into semantic layers and generates layer-specific attributes with context from previously rendered layers. It combines a layer planning module (utilizing GPT-4o) with a layered design composition process built on Large Multimodal Models, enabling sequential, context-aware design generation. Experiments on Crello datasets show LaDeCo achieving state-of-the-art performance on holistic design composition and outperforming task-specific baselines on subtasks like content-aware layout and typography, with ablations validating the importance of layer planning, layering, and data size. The approach supports flexible subtask handling (e.g., partial layer guidance) and practical applications such as resolution adjustment, element filling, and design variation, with potential for end-to-end content creation when integrated with image-generation models.

Abstract

In this work, we investigate automatic design composition from multimodal graphic elements. Although recent studies have developed various generative models for graphic design, they usually face the following limitations: they only focus on certain subtasks and are far from achieving the design composition task; they do not consider the hierarchical information of graphic designs during the generation process. To tackle these issues, we introduce the layered design principle into Large Multimodal Models (LMMs) and propose a novel approach, called LaDeCo, to accomplish this challenging task. Specifically, LaDeCo first performs layer planning for a given element set, dividing the input elements into different semantic layers according to their contents. Based on the planning results, it subsequently predicts element attributes that control the design composition in a layer-wise manner, and includes the rendered image of previously generated layers into the context. With this insightful design, LaDeCo decomposes the difficult task into smaller manageable steps, making the generation process smoother and clearer. The experimental results demonstrate the effectiveness of LaDeCo in design composition. Furthermore, we show that LaDeCo enables some interesting applications in graphic design, such as resolution adjustment, element filling, design variation, etc. In addition, it even outperforms the specialized models in some design subtasks without any task-specific training.

From Elements to Design: A Layered Approach for Automatic Graphic Design Composition

TL;DR

LaDeCo addresses the challenge of automatic holistic graphic design composition by introducing a layered design principle that organizes input multimodal elements into semantic layers and generates layer-specific attributes with context from previously rendered layers. It combines a layer planning module (utilizing GPT-4o) with a layered design composition process built on Large Multimodal Models, enabling sequential, context-aware design generation. Experiments on Crello datasets show LaDeCo achieving state-of-the-art performance on holistic design composition and outperforming task-specific baselines on subtasks like content-aware layout and typography, with ablations validating the importance of layer planning, layering, and data size. The approach supports flexible subtask handling (e.g., partial layer guidance) and practical applications such as resolution adjustment, element filling, and design variation, with potential for end-to-end content creation when integrated with image-generation models.

Abstract

In this work, we investigate automatic design composition from multimodal graphic elements. Although recent studies have developed various generative models for graphic design, they usually face the following limitations: they only focus on certain subtasks and are far from achieving the design composition task; they do not consider the hierarchical information of graphic designs during the generation process. To tackle these issues, we introduce the layered design principle into Large Multimodal Models (LMMs) and propose a novel approach, called LaDeCo, to accomplish this challenging task. Specifically, LaDeCo first performs layer planning for a given element set, dividing the input elements into different semantic layers according to their contents. Based on the planning results, it subsequently predicts element attributes that control the design composition in a layer-wise manner, and includes the rendered image of previously generated layers into the context. With this insightful design, LaDeCo decomposes the difficult task into smaller manageable steps, making the generation process smoother and clearer. The experimental results demonstrate the effectiveness of LaDeCo in design composition. Furthermore, we show that LaDeCo enables some interesting applications in graphic design, such as resolution adjustment, element filling, design variation, etc. In addition, it even outperforms the specialized models in some design subtasks without any task-specific training.
Paper Structure (23 sections, 1 equation, 19 figures, 3 tables)

This paper contains 23 sections, 1 equation, 19 figures, 3 tables.

Figures (19)

  • Figure 1: (a) Given a set of multimodal elements as input, our approach automatically composes them into a cohesive, balanced, and aesthetically pleasing graphic design. (b) Since a holistic design can be divided into different layers according to element semantics, we achieve the design composition task in a layer-by-layer manner. (c) Our approach is able to craft high-quality design pieces.
  • Figure 2: Illustration of our proposed LaDeCo. First, it utilizes GPT-4o gpt4o to annotate the semantic labels for input elements. The layer structure is obtained from the predictions. Then LaDeCo fine-tunes LMMs to achieve layered design composition. After generating each layer, the intermediate designs will be rendered as images and fed back into LMMs to guide subsequent layer generation.
  • Figure 3: Qualitative comparison. We also show the ground truth designs for these samples. Please zoom in for a better view.
  • Figure 4: The rendered results of different layers from LaDeCo.
  • Figure 5: LaDeCo composes the same input elements to designs with different canvas sizes.
  • ...and 14 more figures