OpenCOLE: Towards Reproducible Automatic Graphic Design Generation

Naoto Inoue; Kento Masui; Wataru Shimoda; Kota Yamaguchi

OpenCOLE: Towards Reproducible Automatic Graphic Design Generation

Naoto Inoue, Kento Masui, Wataru Shimoda, Kota Yamaguchi

TL;DR

OpenCOLE tackles reproducibility in automatic graphic design by delivering an open-source pipeline trained exclusively on public data (Crello) and releasing results to foster community development. It preserves COLE’s architecture—design-plan generation, image synthesis, typography generation, and rendering—while adapting components to public datasets and open-model pipelines. GPT4V-based evaluation on the DESIGNERINTENTION benchmark indicates OpenCOLE achieves performance close to COLE, though state-of-the-art text-to-image systems like SDXL1.0 and DALL-E3 still outperform it. This work demonstrates the feasibility of democratizing automated graphic-design tools while highlighting the need for open, robust evaluation frameworks and continued improvements toward higher-quality, editable designs.

Abstract

Automatic generation of graphic designs has recently received considerable attention. However, the state-of-the-art approaches are complex and rely on proprietary datasets, which creates reproducibility barriers. In this paper, we propose an open framework for automatic graphic design called OpenCOLE, where we build a modified version of the pioneering COLE and train our model exclusively on publicly available datasets. Based on GPT4V evaluations, our model shows promising performance comparable to the original COLE. We release the pipeline and training results to encourage open development.

OpenCOLE: Towards Reproducible Automatic Graphic Design Generation

TL;DR

Abstract

Paper Structure (15 sections, 3 figures, 1 table)

This paper contains 15 sections, 3 figures, 1 table.

Introduction
Method
Design Plan Generation
Image Generation
Typography Generation
Experiments
Implementation Details
Dataset for In-Context Learning
Design Plan Generation
Image Generation
Typography Generation
Benchmark, Metrics, and Baselines
Quantitative Evaluation
Qualitative Evaluation
Discussion

Figures (3)

Figure 1: An architecture of OpenCOLE. We mimic the architecture of COLE with adjustments. The user intention is first converted to a design plan with GPT3.5 and in-context learning. Then, the image generation module and the typography generation module synthesize design elements following a design plan. Finally, the graphic renderer composes the final image.
Figure 2: Failure cases of OpenCOLE.
Figure 3: The comparisons between COLE and OpenCOLE. The middle and right images are generated designs from the left intentions by COLE and OpenCOLE, respectively.

OpenCOLE: Towards Reproducible Automatic Graphic Design Generation

TL;DR

Abstract

OpenCOLE: Towards Reproducible Automatic Graphic Design Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (3)