Table of Contents
Fetching ...

LayoutCoT: Unleashing the Deep Reasoning Potential of Large Language Models for Layout Generation

Hengyu Shi, Junhao Su, Tianyang Han, Junfeng Luo, Jialin Gao

TL;DR

LayoutCoT tackles training-free layout generation by enabling deep reasoning in LLMs via a layout-aware RAG and a three-stage Chain-of-Thought refinement. It uses a dissimilarity-based retrieval score $LTSim(\mathcal{L}, \hat{\mathcal{L}})$ to fetch exemplar layouts, generates a coarse layout, and then refines it through Stage 1 position, Stage 2 size/placement, and Stage 3 fine-tuning, all within a single prompt loop. The approach achieves state-of-the-art results on five public datasets across content-aware, constraint-explicit, and text-to-layout tasks without any task-specific training, with GPT-4 outperforming specialized deep-reasoning models in some tasks. This work demonstrates the practical potential of training-free LLM-based layout design, offering a versatile, data-efficient path for real-world UI and graphic design tasks, leveraging top-$K$ retrieval and iterative reasoning.

Abstract

Conditional layout generation aims to automatically generate visually appealing and semantically coherent layouts from user-defined constraints. While recent methods based on generative models have shown promising results, they typically require substantial amounts of training data or extensive fine-tuning, limiting their versatility and practical applicability. Alternatively, some training-free approaches leveraging in-context learning with Large Language Models (LLMs) have emerged, but they often suffer from limited reasoning capabilities and overly simplistic ranking mechanisms, which restrict their ability to generate consistently high-quality layouts. To this end, we propose LayoutCoT, a novel approach that leverages the reasoning capabilities of LLMs through a combination of Retrieval-Augmented Generation (RAG) and Chain-of-Thought (CoT) techniques. Specifically, LayoutCoT transforms layout representations into a standardized serialized format suitable for processing by LLMs. A Layout-aware RAG is used to facilitate effective retrieval and generate a coarse layout by LLMs. This preliminary layout, together with the selected exemplars, is then fed into a specially designed CoT reasoning module for iterative refinement, significantly enhancing both semantic coherence and visual quality. We conduct extensive experiments on five public datasets spanning three conditional layout generation tasks. Experimental results demonstrate that LayoutCoT achieves state-of-the-art performance without requiring training or fine-tuning. Notably, our CoT reasoning module enables standard LLMs, even those without explicit deep reasoning abilities, to outperform specialized deep-reasoning models such as deepseek-R1, highlighting the potential of our approach in unleashing the deep reasoning capabilities of LLMs for layout generation tasks.

LayoutCoT: Unleashing the Deep Reasoning Potential of Large Language Models for Layout Generation

TL;DR

LayoutCoT tackles training-free layout generation by enabling deep reasoning in LLMs via a layout-aware RAG and a three-stage Chain-of-Thought refinement. It uses a dissimilarity-based retrieval score to fetch exemplar layouts, generates a coarse layout, and then refines it through Stage 1 position, Stage 2 size/placement, and Stage 3 fine-tuning, all within a single prompt loop. The approach achieves state-of-the-art results on five public datasets across content-aware, constraint-explicit, and text-to-layout tasks without any task-specific training, with GPT-4 outperforming specialized deep-reasoning models in some tasks. This work demonstrates the practical potential of training-free LLM-based layout design, offering a versatile, data-efficient path for real-world UI and graphic design tasks, leveraging top- retrieval and iterative reasoning.

Abstract

Conditional layout generation aims to automatically generate visually appealing and semantically coherent layouts from user-defined constraints. While recent methods based on generative models have shown promising results, they typically require substantial amounts of training data or extensive fine-tuning, limiting their versatility and practical applicability. Alternatively, some training-free approaches leveraging in-context learning with Large Language Models (LLMs) have emerged, but they often suffer from limited reasoning capabilities and overly simplistic ranking mechanisms, which restrict their ability to generate consistently high-quality layouts. To this end, we propose LayoutCoT, a novel approach that leverages the reasoning capabilities of LLMs through a combination of Retrieval-Augmented Generation (RAG) and Chain-of-Thought (CoT) techniques. Specifically, LayoutCoT transforms layout representations into a standardized serialized format suitable for processing by LLMs. A Layout-aware RAG is used to facilitate effective retrieval and generate a coarse layout by LLMs. This preliminary layout, together with the selected exemplars, is then fed into a specially designed CoT reasoning module for iterative refinement, significantly enhancing both semantic coherence and visual quality. We conduct extensive experiments on five public datasets spanning three conditional layout generation tasks. Experimental results demonstrate that LayoutCoT achieves state-of-the-art performance without requiring training or fine-tuning. Notably, our CoT reasoning module enables standard LLMs, even those without explicit deep reasoning abilities, to outperform specialized deep-reasoning models such as deepseek-R1, highlighting the potential of our approach in unleashing the deep reasoning capabilities of LLMs for layout generation tasks.

Paper Structure

This paper contains 32 sections, 10 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Qualitative evaluation of the multi-stage LayoutCoT framework. Each successive stage systematically refines the spatial arrangement of layout elements, resulting in designs with enhanced coherence and rationality.
  • Figure 2: Overview of the LayoutCoT framework. Our training-free approach initially employs a Layout-aware RAG to prompt the LLM for a coarse layout prediction, establishing a logical and visually coherent arrangement of elements. This is subsequently refined via a multi-stage Chain-of-Thought (CoT) module, which iteratively enhances the layout by resolving spatial conflicts and fine-tuning dimensions. The proposed framework is versatile and applicable to a wide array of layout generation tasks.
  • Figure 3: Details of the CoT Module. We illustrate the overall conversational logic of the multi-stage CoT. Depending on the type of task, there are slight variations in the details to adapt to specific requirements. Some of LayoutCoT prompts can be found in the supplementary material.
  • Figure 4: Qualitative Results for the Content-aware Layout Generation Task. LayoutPrompter and LayoutCoT$^\dag$ tend to generate dense, small boxes in the upper-left corner, which is unreasonable. In contrast, LayoutCoT effectively corrects this error and achieves satisfactory results.
  • Figure 5: Qualitative Results for the Constraint-Explicit Layout Generation Task. The RICO dataset features a wide variety of label categories, making the completion task on RICO more challenging compared to other task types, we conduct visualizations based on the completion task of the RICO dataset. It is evident that LayoutCoT designs more rational layouts, with better element overlap and alignment compared to other methods.