PosterLlama: Bridging Design Ability of Langauge Model to Contents-Aware Layout Generation
Jaejung Seol, Seojun Kim, Jaejun Yoo
TL;DR
PosterLlama addresses the challenge of content-aware poster layout generation by reformatting layout elements as HTML sequences and leveraging the design knowledge embedded in large language models. It introduces a two-stage vision-language training pipeline with a visual encoder adapter and an HTML-based generation target, augmented by depth-guided data augmentation to mitigate data scarcity and inpainting artifacts. Empirical results show PosterLlama achieving state-of-the-art or competitive performance across graphic and content measures on CGL and PKU datasets, while demonstrating strong conditional generation capabilities and robustness to data leakage. The method offers practical utility through a one-click poster generation pipeline and proves effective even with limited data, making it suitable for real-world design workflows.
Abstract
Visual layout plays a critical role in graphic design fields such as advertising, posters, and web UI design. The recent trend towards content-aware layout generation through generative models has shown promise, yet it often overlooks the semantic intricacies of layout design by treating it as a simple numerical optimization. To bridge this gap, we introduce PosterLlama, a network designed for generating visually and textually coherent layouts by reformatting layout elements into HTML code and leveraging the rich design knowledge embedded within language models. Furthermore, we enhance the robustness of our model with a unique depth-based poster augmentation strategy. This ensures our generated layouts remain semantically rich but also visually appealing, even with limited data. Our extensive evaluations across several benchmarks demonstrate that PosterLlama outperforms existing methods in producing authentic and content-aware layouts. It supports an unparalleled range of conditions, including but not limited to unconditional layout generation, element conditional layout generation, layout completion, among others, serving as a highly versatile user manipulation tool.
