Table of Contents
Fetching ...

TextLap: Customizing Language Models for Text-to-Layout Planning

Jian Chen, Ruiyi Zhang, Yufan Zhou, Jennifer Healey, Jiuxiang Gu, Zhiqiang Xu, Changyou Chen

TL;DR

This work uses a curated instruction-based layout planning dataset (InsLap) to customize LLMs as a graphic designer and shows that it outperforms strong baselines, including GPT-4 based methods, for image generation and graphical design benchmarks.

Abstract

Automatic generation of graphical layouts is crucial for many real-world applications, including designing posters, flyers, advertisements, and graphical user interfaces. Given the incredible ability of Large language models (LLMs) in both natural language understanding and generation, we believe that we could customize an LLM to help people create compelling graphical layouts starting with only text instructions from the user. We call our method TextLap (text-based layout planning). It uses a curated instruction-based layout planning dataset (InsLap) to customize LLMs as a graphic designer. We demonstrate the effectiveness of TextLap and show that it outperforms strong baselines, including GPT-4 based methods, for image generation and graphical design benchmarks.

TextLap: Customizing Language Models for Text-to-Layout Planning

TL;DR

This work uses a curated instruction-based layout planning dataset (InsLap) to customize LLMs as a graphic designer and shows that it outperforms strong baselines, including GPT-4 based methods, for image generation and graphical design benchmarks.

Abstract

Automatic generation of graphical layouts is crucial for many real-world applications, including designing posters, flyers, advertisements, and graphical user interfaces. Given the incredible ability of Large language models (LLMs) in both natural language understanding and generation, we believe that we could customize an LLM to help people create compelling graphical layouts starting with only text instructions from the user. We call our method TextLap (text-based layout planning). It uses a curated instruction-based layout planning dataset (InsLap) to customize LLMs as a graphic designer. We demonstrate the effectiveness of TextLap and show that it outperforms strong baselines, including GPT-4 based methods, for image generation and graphical design benchmarks.

Paper Structure

This paper contains 44 sections, 10 figures, 8 tables.

Figures (10)

  • Figure 1: Overview of TextLap fine-tuned on InstLap. 1) TextLap can perform graphic designs and output coordinates given a list of elements including images, texts, and scalable vector graphics (SVG). The image is rendered accordingly. 2) TextLap can extract key elements from text prompts and provie their coordinates. The image can be rendered with image generation tools.
  • Figure 2: Overview of how to build the InstLap dataset. (a) shows how to build InstLap based on COCO dataset, which is composed of two data augmentations for input instructions and output layouts, respectively. (b) presents the how to incorporate Crello dataset into Instlap, where visual elements are first described by Phi-3-Vision and augmented into the text instructions.
  • Figure 3: Loss curve on close-set layout generation with 80-class COCO labels.
  • Figure 4: An example from InstLap that is built based on the Crello dataset.
  • Figure 5: Generated visual and textual layout planning examples. Layouts are provided by TextLap given text prompts and images are rendered by ARTIST zhang2024artist and InstanceDiffusion wang2024instancediffusion respectively.
  • ...and 5 more figures