Manga Generation via Layout-controllable Diffusion
Siyu Chen, Dengjie Li, Zenghao Bao, Yao Zhou, Lingfeng Tan, Yujie Zhong, Zheng Zhao
TL;DR
The paper tackles generating multi-panel manga pages from plain text by introducing Manga109Story as a captioned, story-aligned extension of Manga109 and a diffusion-based generator, MangaDiffusion, that models intra-panel and inter-panel interactions. It uses a two-step pipeline: segmenting an input story with an LLM into per-panel scripts and then generating panels with a Transformer-based diffusion model while masking speech bubbles to reduce clutter. The approach achieves controlled panel counts and diverse, coherent layouts, demonstrating strong quantitative results (FID and CLIP-I) and qualitative layout consistency, while noting data limitations and room for improvement in cross-panel coherence and character consistency. The work provides a practical path to convert textual narratives into engaging manga content and offers datasets and architectural insights for future manga generation research.
Abstract
Generating comics through text is widely studied. However, there are few studies on generating multi-panel Manga (Japanese comics) solely based on plain text. Japanese manga contains multiple panels on a single page, with characteristics such as coherence in storytelling, reasonable and diverse page layouts, consistency in characters, and semantic correspondence between panel drawings and panel scripts. Therefore, generating manga poses a significant challenge. This paper presents the manga generation task and constructs the Manga109Story dataset for studying manga generation solely from plain text. Additionally, we propose MangaDiffusion to facilitate the intra-panel and inter-panel information interaction during the manga generation process. The results show that our method particularly ensures the number of panels, reasonable and diverse page layouts. Based on our approach, there is potential to converting a large amount of textual stories into more engaging manga readings, leading to significant application prospects.
