Smaller But Better: Unifying Layout Generation with Smaller Large Language Models
Peirong Zhang, Jiaxin Zhang, Jiahuan Cao, Hongliang Li, Lianwen Jin
TL;DR
This work introduces LGGPT, a decoder-only LLM designed for unified layout generation across multiple tasks and domains by adopting Arbitrary Layout Instruction (ALI) and Universal Layout Response (ULR) as a compact, uniform input/output template and Interval Quantization Encoding (IQE) to preserve layout cues without placeholders. By training a $1.5\mathrm{B}$ GPT2-XL backbone with heterogeneous, instruction-tuned data across four domains and five datasets, LGGPT achieves competitive or state-of-the-art results on isolated tasks while maintaining superior efficiency relative to much larger models. The authors demonstrate the necessity of LLMs for unified layout generation, show that IQE and ALI substantially boost performance, and identify 1.5B as a practical sweet spot balancing proficiency and computational cost. Collectively, the work provides a practical, scalable path toward task- and domain-generic layout generation with potential for extension to broader layout types and text-conditioned scenarios.
Abstract
We propose LGGPT, an LLM-based model tailored for unified layout generation. First, we propose Arbitrary Layout Instruction (ALI) and Universal Layout Response (ULR) as the uniform I/O template. ALI accommodates arbitrary layout generation task inputs across multiple layout domains, enabling LGGPT to unify both task-generic and domain-generic layout generation hitherto unexplored. Collectively, ALI and ULR boast a succinct structure that forgoes superfluous tokens typically found in existing HTML-based formats, facilitating efficient instruction tuning and boosting unified generation performance. In addition, we propose an Interval Quantization Encoding (IQE) strategy that compresses ALI into a more condensed structure. IQE precisely preserves valid layout clues while eliminating the less informative placeholders, facilitating LGGPT to capture complex and variable layout generation conditions during the unified training process. Experimental results demonstrate that LGGPT achieves superior or on par performance compared to existing methods. Notably, LGGPT strikes a prominent balance between proficiency and efficiency with a compact 1.5B parameter LLM, which beats prior 7B or 175B models even in the most extensive and challenging unified scenario. Furthermore, we underscore the necessity of employing LLMs for unified layout generation and suggest that 1.5B could be an optimal parameter size by comparing LLMs of varying scales. Code is available at https://github.com/NiceRingNode/LGGPT.
