LLMs as Layout Designers: Enhanced Spatial Reasoning for Content-Aware Layout Generation
Sha Li, Stefano Petrangeli, Yu Shen, Xiang Chen, Naren Ramakrishnan
TL;DR
LaySPA introduces a reinforcement learning framework that augments LLM agents with explicit spatial reasoning to tackle content-aware layout generation. By formulating the task as policy learning and using a hybrid reward system ($R_{format}$, $R_{quality}$, and $R_{IoU}$) optimized with Group Relative Policy Optimization ($GRPO$), the approach produces layouts with accurate geometries, balanced distribution, and coherent hierarchies, while exposing an interpretable reasoning trace. Experiments on the CGL and PKU-PosterLayout datasets demonstrate substantial gains over baseline LLMs and competitive performance with specialized models, achieving data-efficient results with as few as 3k annotated examples. The work advances autonomous, space-aware design with potential extensions to richer visual semantics, multi-turn design interactions, and broader applications such as user interfaces and magazines.
Abstract
While Large Language Models (LLMs) have demonstrated impressive reasoning and planning abilities in textual domains and can effectively follow instructions for complex tasks, their ability to understand and manipulate spatial relationships remains limited. Such capabilities are crucial for content-aware graphic layout design, where the goal is to arrange heterogeneous elements onto a canvas so that final design remains visually balanced and structurally feasible. This problem requires precise coordination of placement, alignment, and structural organization of multiple elements within a constrained visual space. To address this limitation, we introduce LaySPA, a reinforcement learning-based framework that augments LLM-based agents with explicit spatial reasoning capabilities for layout design. LaySPA employs hybrid reward signals that jointly capture geometric constraints, structural fidelity, and visual quality, enabling agents to navigate the canvas, model inter-element relationships, and optimize spatial arrangements. Through group-relative policy optimization, the agent generates content-aware layouts that reflect salient regions, respect spatial constraints, and produces an interpretable reasoning trace explaining placement decisions and a structured layout specification. Experimental results show that LaySPA substantially improves the generation of structurally valid and visually appealing layouts, outperforming larger general-purpose LLMs and achieving performance comparable to state-of-the-art specialized layout models.
