DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning

Abhay Zala; Han Lin; Jaemin Cho; Mohit Bansal

DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning

Abhay Zala, Han Lin, Jaemin Cho, Mohit Bansal

TL;DR

DiagrammerGPT tackles open-domain diagram generation by coupling LLM-based planning with a layout-grounded diagram generator. The two-stage pipeline first uses an LLM to produce detailed diagram plans, refined through a planner-auditor feedback loop, and then DiagramGLIGEN translates those plans into accurate diagrams with explicit text rendering. The authors introduce the AI2D-Caption dataset to benchmark text-to-diagram generation and demonstrate that DiagrammerGPT outperforms strong baselines on objective diagram-quality metrics and human judgments, while enabling vector-graphic exports and human-in-the-loop editing. This work highlights the potential of combining planning-focused LLMs with layout-grounded generation to improve diagrammatic information visualization and accessibility in education and research.

Abstract

Text-to-image (T2I) generation has seen significant growth over the past few years. Despite this, there has been little work on generating diagrams with T2I models. A diagram is a symbolic/schematic representation that explains information using structurally rich and spatially complex visualizations (e.g., a dense combination of related objects, text labels, directional arrows/lines, etc.). Existing state-of-the-art T2I models often fail at diagram generation because they lack fine-grained object layout control when many objects are densely connected via complex relations such as arrows/lines, and also often fail to render comprehensible text labels. To address this gap, we present DiagrammerGPT, a novel two-stage text-to-diagram generation framework leveraging the layout guidance capabilities of LLMs to generate more accurate diagrams. In the first stage, we use LLMs to generate and iteratively refine 'diagram plans' (in a planner-auditor feedback loop). In the second stage, we use a diagram generator, DiagramGLIGEN, and a text label rendering module to generate diagrams (with clear text labels) following the diagram plans. To benchmark the text-to-diagram generation task, we introduce AI2D-Caption, a densely annotated diagram dataset built on top of the AI2D dataset. We show that our DiagrammerGPT framework produces more accurate diagrams, outperforming existing T2I models. We also provide comprehensive analysis, including open-domain diagram generation, multi-platform vector graphic diagram generation, human-in-the-loop editing, and multimodal planner/auditor LLMs.

DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning

TL;DR

Abstract

Paper Structure (47 sections, 21 figures, 7 tables)

This paper contains 47 sections, 21 figures, 7 tables.

Introduction
Related Works
Text-to-Image Generation
Text-to-Image Generation with LLM-Guided Layouts
DiagrammerGPT: Method Details
Stage 1: Diagram Planning
Stage 2: Diagram Generation
Experimental Setup
AI2D-Caption Dataset
Baseline Models
Evaluation Metrics
Human Evaluation
Results and Discussion
Quantitative Results
Human Evaluation
...and 32 more sections

Figures (21)

Figure 1: An overview of DiagrammerGPT, our two-stage framework for open-domain diagram generation. In the first diagram planning stage (\ref{['subsec:stage1_method']}), given a prompt, our LLM (GPT-4 OpenAI2023GPT4TR) generates a diagram plan, which consists of dense entities, fine-grained relationships, and precise layouts. Then, the LLM iteratively refines the plan to correct mistakes. In the second diagram generation stage (\ref{['subsec:stage2_method']}), our DiagramGLIGEN outputs the diagram given the diagram plan, then, we render the text labels on the diagram.
Figure 2: Illustration of the first stage of DiagrammerGPT: diagram planning (\ref{['subsec:stage1_method']}). We use a planner LLM (e.g., GPT-4 OpenAI2023GPT4TR) to create the fine-grained layouts of diagrams, which we call diagram plans. We first generate an initial diagram from the input text prompt with an LLM (left). Then we iteratively refine diagram plans in a feedback loop of the planner and auditor LLMs.
Figure 3: Illustration of the second stage of DiagrammerGPT: diagram generation (\ref{['subsec:stage2_method']}). We first generate the objects from the diagram plan with DiagramGLIGEN, our layout-guided diagram generation model. Then, we use Pillow to render clear text labels.
Figure 4: Example diagram annotation from the AI2D dataset AI2D (left) and our AI2D-Caption (right). AI2D-Caption additionally provides annotations of the diagram caption and bounding box region descriptions.
Figure 5: Example diagram generation results from baselines (fine-tuned Stable Diffusion v1.4 and AutomaTikZ) and our DiagrammerGPT on the AI2D-Caption test split. Our DiagrammerGPT correctly follows the caption while the baselines make several errors.
...and 16 more figures

DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning

TL;DR

Abstract

DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning

Authors

TL;DR

Abstract

Table of Contents

Figures (21)