Table of Contents
Fetching ...

PosterOmni: Generalized Artistic Poster Creation via Task Distillation and Unified Reward Feedback

Sixiang Chen, Jianyu Lai, Jialin Gao, Hengyu Shi, Zhongying Liu, Tian Ye, Junfeng Luo, Xiaoming Wei, Lei Zhu

TL;DR

PosterOmni addresses the challenge of open, multi-task image-to-poster generation by unifying local editing and global creation under a single framework. It introduces a fully automated data pipeline (PosterOmni-200K and PosterOmni-Bench), task-distillation-based fine-tuning to merge local and global experts, and a unified PosterOmni Reward Model with Omni-Edit RL using DiffusionNFT. The approach yields state-of-the-art performance among open-source models and rivals some proprietary systems, improving reference adherence, layout coherence, and aesthetic harmony across six poster tasks. The work provides an end-to-end open framework, data, and benchmarks to advance automated, design-aware poster generation and potentially other graphic-design tasks.

Abstract

Image-to-poster generation is a high-demand task requiring not only local adjustments but also high-level design understanding. Models must generate text, layout, style, and visual elements while preserving semantic fidelity and aesthetic coherence. The process spans two regimes: local editing, where ID-driven generation, rescaling, filling, and extending must preserve concrete visual entities; and global creation, where layout- and style-driven tasks rely on understanding abstract design concepts. These intertwined demands make image-to-poster a multi-dimensional process coupling entity-preserving editing with concept-driven creation under image-prompt control. To address these challenges, we propose PosterOmni, a generalized artistic poster creation framework that unlocks the potential of a base edit model for multi-task image-to-poster generation. PosterOmni integrates the two regimes, namely local editing and global creation, within a single system through an efficient data-distillation-reward pipeline: (i) constructing multi-scenario image-to-poster datasets covering six task types across entity-based and concept-based creation; (ii) distilling knowledge between local and global experts for supervised fine-tuning; and (iii) applying unified PosterOmni Reward Feedback to jointly align visual entity-preserving and aesthetic preference across all tasks. Additionally, we establish PosterOmni-Bench, a unified benchmark for evaluating both local editing and global creation. Extensive experiments show that PosterOmni significantly enhances reference adherence, global composition quality, and aesthetic harmony, outperforming all open-source baselines and even surpassing several proprietary systems.

PosterOmni: Generalized Artistic Poster Creation via Task Distillation and Unified Reward Feedback

TL;DR

PosterOmni addresses the challenge of open, multi-task image-to-poster generation by unifying local editing and global creation under a single framework. It introduces a fully automated data pipeline (PosterOmni-200K and PosterOmni-Bench), task-distillation-based fine-tuning to merge local and global experts, and a unified PosterOmni Reward Model with Omni-Edit RL using DiffusionNFT. The approach yields state-of-the-art performance among open-source models and rivals some proprietary systems, improving reference adherence, layout coherence, and aesthetic harmony across six poster tasks. The work provides an end-to-end open framework, data, and benchmarks to advance automated, design-aware poster generation and potentially other graphic-design tasks.

Abstract

Image-to-poster generation is a high-demand task requiring not only local adjustments but also high-level design understanding. Models must generate text, layout, style, and visual elements while preserving semantic fidelity and aesthetic coherence. The process spans two regimes: local editing, where ID-driven generation, rescaling, filling, and extending must preserve concrete visual entities; and global creation, where layout- and style-driven tasks rely on understanding abstract design concepts. These intertwined demands make image-to-poster a multi-dimensional process coupling entity-preserving editing with concept-driven creation under image-prompt control. To address these challenges, we propose PosterOmni, a generalized artistic poster creation framework that unlocks the potential of a base edit model for multi-task image-to-poster generation. PosterOmni integrates the two regimes, namely local editing and global creation, within a single system through an efficient data-distillation-reward pipeline: (i) constructing multi-scenario image-to-poster datasets covering six task types across entity-based and concept-based creation; (ii) distilling knowledge between local and global experts for supervised fine-tuning; and (iii) applying unified PosterOmni Reward Feedback to jointly align visual entity-preserving and aesthetic preference across all tasks. Additionally, we establish PosterOmni-Bench, a unified benchmark for evaluating both local editing and global creation. Extensive experiments show that PosterOmni significantly enhances reference adherence, global composition quality, and aesthetic harmony, outperforming all open-source baselines and even surpassing several proprietary systems.
Paper Structure (27 sections, 20 equations, 25 figures, 6 tables)

This paper contains 27 sections, 20 equations, 25 figures, 6 tables.

Figures (25)

  • Figure 1: PosterOmni unifies local editing and global creation within a single image-to-poster generation framework. It covers six representative tasks—extending, filling, rescaling, identity-driven, layout-driven, and style-driven poster generation—enabling the model to achieve both fine-grained visual editing and holistic aesthetic composition.
  • Figure 2: We decompose image-to-poster generation into local editing and global creation, including extending, filling, rescaling, identity-driven, layout-driven, and style-driven generation. Our overall pipeline integrates prompt generation, image generation, multimodal filtering, and task-specific construction into a unified framework for large-scale, image-to-poster data generation. We then propose PosterOmni-200K and PosterOmni-Bench, which encompass six major poster themes and multi-image input scenarios.
  • Figure 3: PosterOmni datasets cover six poster themes (products, foods, events/travel, nature, education, and entertainment) and support both local editing and global creation tasks.
  • Figure 4: PosterOmni training workflow through four stages: (i) task-specific SFT for local and global experts, (ii) task distillation to integrate them into a single PosterOmni-SFT model, (iii) reward training for the unified PosterOmni Reward $R_{\text{omni}}$, and (iv) Omni-Edit RL using DiffusionNFT to align creation with human-preferred aesthetics and precision. For clarity, only one task is illustrated in (iii) and (iv).
  • Figure 5: Visual comparison of different model outputs. Red boxes highlight errors and distorted entities, while yellow boxes indicate incorrect or missing text elements. Compared to other methods, our method is able to accomplish all image-generated poster tasks more effectively, while also achieving excellent aesthetic quality.
  • ...and 20 more figures