Table of Contents
Fetching ...

POSTA: A Go-to Framework for Customized Artistic Poster Generation

Haoyu Chen, Xiaojie Xu, Wenbo Li, Jingjing Ren, Tian Ye, Songhua Liu, Ying-Cong Chen, Lei Zhu, Xinchao Wang

TL;DR

POSTA addresses the challenge of automated artistic poster generation by integrating diffusion-based background synthesis, an MLLM-driven design planning module, and a BrushNet-inspired artistic text stylization component. The approach is trained on the two-part PosterArt dataset (PosterArt-Design and PosterArt-Text) to deliver end-to-end, fully editable posters where content accuracy and visual appeal cohere with the background. Empirical results show POSTA achieves superior text accuracy, layout fidelity, and aesthetic quality compared with existing methods and state-of-the-art baselines, including robust handling of long text and flexible customization. The framework enables practical designer-oriented workflows for diverse poster tasks and sets the stage for broader stylistic expansion and dataset scale in future work.

Abstract

Poster design is a critical medium for visual communication. Prior work has explored automatic poster design using deep learning techniques, but these approaches lack text accuracy, user customization, and aesthetic appeal, limiting their applicability in artistic domains such as movies and exhibitions, where both clear content delivery and visual impact are essential. To address these limitations, we present POSTA: a modular framework powered by diffusion models and multimodal large language models (MLLMs) for customized artistic poster generation. The framework consists of three modules. Background Diffusion creates a themed background based on user input. Design MLLM then generates layout and typography elements that align with and complement the background style. Finally, to enhance the poster's aesthetic appeal, ArtText Diffusion applies additional stylization to key text elements. The final result is a visually cohesive and appealing poster, with a fully modular process that allows for complete customization. To train our models, we develop the PosterArt dataset, comprising high-quality artistic posters annotated with layout, typography, and pixel-level stylized text segmentation. Our comprehensive experimental analysis demonstrates POSTA's exceptional controllability and design diversity, outperforming existing models in both text accuracy and aesthetic quality.

POSTA: A Go-to Framework for Customized Artistic Poster Generation

TL;DR

POSTA addresses the challenge of automated artistic poster generation by integrating diffusion-based background synthesis, an MLLM-driven design planning module, and a BrushNet-inspired artistic text stylization component. The approach is trained on the two-part PosterArt dataset (PosterArt-Design and PosterArt-Text) to deliver end-to-end, fully editable posters where content accuracy and visual appeal cohere with the background. Empirical results show POSTA achieves superior text accuracy, layout fidelity, and aesthetic quality compared with existing methods and state-of-the-art baselines, including robust handling of long text and flexible customization. The framework enables practical designer-oriented workflows for diverse poster tasks and sets the stage for broader stylistic expansion and dataset scale in future work.

Abstract

Poster design is a critical medium for visual communication. Prior work has explored automatic poster design using deep learning techniques, but these approaches lack text accuracy, user customization, and aesthetic appeal, limiting their applicability in artistic domains such as movies and exhibitions, where both clear content delivery and visual impact are essential. To address these limitations, we present POSTA: a modular framework powered by diffusion models and multimodal large language models (MLLMs) for customized artistic poster generation. The framework consists of three modules. Background Diffusion creates a themed background based on user input. Design MLLM then generates layout and typography elements that align with and complement the background style. Finally, to enhance the poster's aesthetic appeal, ArtText Diffusion applies additional stylization to key text elements. The final result is a visually cohesive and appealing poster, with a fully modular process that allows for complete customization. To train our models, we develop the PosterArt dataset, comprising high-quality artistic posters annotated with layout, typography, and pixel-level stylized text segmentation. Our comprehensive experimental analysis demonstrates POSTA's exceptional controllability and design diversity, outperforming existing models in both text accuracy and aesthetic quality.

Paper Structure

This paper contains 23 sections, 1 equation, 13 figures.

Figures (13)

  • Figure 1: Generated results using our POSTA framework. The background, layout, and typographical designs are fully crafted from text inputs, showcasing the framework's capability to produce cohesive and visually engaging elements solely through textual guidance.
  • Figure 2: Our motivation stems from limitations of current methods for poster generation, which often struggle with issues like text inaccuracy, limited customization, and insufficient aesthetic quality.
  • Figure 3: Overview of PosterArt-Text dataset. It contains extensive segmentation and corresponding descriptions of texts with diverse artistic styles, primarily sourced from artistic posters such as those for movies, album covers, and similar media.
  • Figure 4: A sample of PosterArt-Design (top). PosterArt-Design vs. previous layout datasets (bottom). Our dataset is crafted by expert designers who carefully incorporate elements into backgrounds, including deliberate layout and typography information.
  • Figure 5: Our POSTA pipeline consists of three steps: background generation, design planning, and artistic text stylization. Background Diffusion and ArtText Diffusion are employed to generate backgrounds and text with artistic effects, while the Design MLLM predicts layout and typography information. The GPT-4V-powered Magic Prompter is used to refine prompts based on user descriptions or background images, optimizing input for the diffusion models.
  • ...and 8 more figures