StoryWeaver: A Unified World Model for Knowledge-Enhanced Story Character Customization
Jinlu Zhang, Jiji Tang, Rongsheng Zhang, Tangjie Lv, Xiaoshuai Sun
TL;DR
StoryWeaver tackles knowledge-rich story visualization by introducing a Character Graph (CG) to encode fine-grained semantic knowledge (characters, attributes, events) and a customization pipeline (C-CG) that converts CG into scene captions for a diffusion model. To address identity blending in multi-character generation, it adds Knowledge-Enhanced Spatial Guidance (KE-SG) that adjusts cross-attention maps using spatial priors derived from CG knowledge. The method is validated on a new TBC-Bench benchmark built around Pororo and Frozen characters, showing substantial gains in identity preservation (DINO-I) and text-semantic alignment (CLIP-T) for single-character tasks, and improved Frame-Accuracy and Character F1 for multi-character tasks, with notable storage efficiency. They also provide extensive ablations, a user study, and release code and datasets to facilitate further research. Overall, StoryWeaver demonstrates that structured knowledge representations coupled with spatially guided attention can unify and enhance both identity fidelity and semantic alignment in story visualization.
Abstract
Story visualization has gained increasing attention in artificial intelligence. However, existing methods still struggle with maintaining a balance between character identity preservation and text-semantics alignment, largely due to a lack of detailed semantic modeling of the story scene. To tackle this challenge, we propose a novel knowledge graph, namely Character Graph (\textbf{CG}), which comprehensively represents various story-related knowledge, including the characters, the attributes related to characters, and the relationship between characters. We then introduce StoryWeaver, an image generator that achieve Customization via Character Graph (\textbf{C-CG}), capable of consistent story visualization with rich text semantics. To further improve the multi-character generation performance, we incorporate knowledge-enhanced spatial guidance (\textbf{KE-SG}) into StoryWeaver to precisely inject character semantics into generation. To validate the effectiveness of our proposed method, extensive experiments are conducted using a new benchmark called TBC-Bench. The experiments confirm that our StoryWeaver excels not only in creating vivid visual story plots but also in accurately conveying character identities across various scenarios with considerable storage efficiency, \emph{e.g.}, achieving an average increase of +9.03\% DINO-I and +13.44\% CLIP-T. Furthermore, ablation experiments are conducted to verify the superiority of the proposed module. Codes and datasets are released at https://github.com/Aria-Zhangjl/StoryWeaver.
