Table of Contents
Fetching ...

LLMs as Layout Designers: Enhanced Spatial Reasoning for Content-Aware Layout Generation

Sha Li, Stefano Petrangeli, Yu Shen, Xiang Chen, Naren Ramakrishnan

TL;DR

LaySPA introduces a reinforcement learning framework that augments LLM agents with explicit spatial reasoning to tackle content-aware layout generation. By formulating the task as policy learning and using a hybrid reward system ($R_{format}$, $R_{quality}$, and $R_{IoU}$) optimized with Group Relative Policy Optimization ($GRPO$), the approach produces layouts with accurate geometries, balanced distribution, and coherent hierarchies, while exposing an interpretable reasoning trace. Experiments on the CGL and PKU-PosterLayout datasets demonstrate substantial gains over baseline LLMs and competitive performance with specialized models, achieving data-efficient results with as few as 3k annotated examples. The work advances autonomous, space-aware design with potential extensions to richer visual semantics, multi-turn design interactions, and broader applications such as user interfaces and magazines.

Abstract

While Large Language Models (LLMs) have demonstrated impressive reasoning and planning abilities in textual domains and can effectively follow instructions for complex tasks, their ability to understand and manipulate spatial relationships remains limited. Such capabilities are crucial for content-aware graphic layout design, where the goal is to arrange heterogeneous elements onto a canvas so that final design remains visually balanced and structurally feasible. This problem requires precise coordination of placement, alignment, and structural organization of multiple elements within a constrained visual space. To address this limitation, we introduce LaySPA, a reinforcement learning-based framework that augments LLM-based agents with explicit spatial reasoning capabilities for layout design. LaySPA employs hybrid reward signals that jointly capture geometric constraints, structural fidelity, and visual quality, enabling agents to navigate the canvas, model inter-element relationships, and optimize spatial arrangements. Through group-relative policy optimization, the agent generates content-aware layouts that reflect salient regions, respect spatial constraints, and produces an interpretable reasoning trace explaining placement decisions and a structured layout specification. Experimental results show that LaySPA substantially improves the generation of structurally valid and visually appealing layouts, outperforming larger general-purpose LLMs and achieving performance comparable to state-of-the-art specialized layout models.

LLMs as Layout Designers: Enhanced Spatial Reasoning for Content-Aware Layout Generation

TL;DR

LaySPA introduces a reinforcement learning framework that augments LLM agents with explicit spatial reasoning to tackle content-aware layout generation. By formulating the task as policy learning and using a hybrid reward system (, , and ) optimized with Group Relative Policy Optimization (), the approach produces layouts with accurate geometries, balanced distribution, and coherent hierarchies, while exposing an interpretable reasoning trace. Experiments on the CGL and PKU-PosterLayout datasets demonstrate substantial gains over baseline LLMs and competitive performance with specialized models, achieving data-efficient results with as few as 3k annotated examples. The work advances autonomous, space-aware design with potential extensions to richer visual semantics, multi-turn design interactions, and broader applications such as user interfaces and magazines.

Abstract

While Large Language Models (LLMs) have demonstrated impressive reasoning and planning abilities in textual domains and can effectively follow instructions for complex tasks, their ability to understand and manipulate spatial relationships remains limited. Such capabilities are crucial for content-aware graphic layout design, where the goal is to arrange heterogeneous elements onto a canvas so that final design remains visually balanced and structurally feasible. This problem requires precise coordination of placement, alignment, and structural organization of multiple elements within a constrained visual space. To address this limitation, we introduce LaySPA, a reinforcement learning-based framework that augments LLM-based agents with explicit spatial reasoning capabilities for layout design. LaySPA employs hybrid reward signals that jointly capture geometric constraints, structural fidelity, and visual quality, enabling agents to navigate the canvas, model inter-element relationships, and optimize spatial arrangements. Through group-relative policy optimization, the agent generates content-aware layouts that reflect salient regions, respect spatial constraints, and produces an interpretable reasoning trace explaining placement decisions and a structured layout specification. Experimental results show that LaySPA substantially improves the generation of structurally valid and visually appealing layouts, outperforming larger general-purpose LLMs and achieving performance comparable to state-of-the-art specialized layout models.

Paper Structure

This paper contains 14 sections, 2 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Given the input canvas and a set of elements (one logo, three text, and one underlay), GPT-5 is prompted to generate a poster where elements avoid saliency areas and the underlay decorates one text box. As shown, GPT-5 fails to correctly place elements to avoid salient regions, maintain structural coherence, and position the underlay for decoration.
  • Figure 2: Overview of the LaySPA framework. The agent iteratively generates candidate layouts, receives hybrid rewards, and refines its policy via GRPO. Black arrows indicate fine-tuning process while red arrows denote the inference flow with the learned policy model.
  • Figure 3: An illustration of how each layout quality score functions guides the agent toward human-preferred designs (✔) while discouraging (✘) undesirable spatial arrangements.
  • Figure 4: Visualization of generated layouts using different methods on (a) CGL and (b) PKU datasets. Qwen-7B results are generated with LaySPA fine-tuning.