Table of Contents
Fetching ...

StoryWeaver: A Unified World Model for Knowledge-Enhanced Story Character Customization

Jinlu Zhang, Jiji Tang, Rongsheng Zhang, Tangjie Lv, Xiaoshuai Sun

TL;DR

StoryWeaver tackles knowledge-rich story visualization by introducing a Character Graph (CG) to encode fine-grained semantic knowledge (characters, attributes, events) and a customization pipeline (C-CG) that converts CG into scene captions for a diffusion model. To address identity blending in multi-character generation, it adds Knowledge-Enhanced Spatial Guidance (KE-SG) that adjusts cross-attention maps using spatial priors derived from CG knowledge. The method is validated on a new TBC-Bench benchmark built around Pororo and Frozen characters, showing substantial gains in identity preservation (DINO-I) and text-semantic alignment (CLIP-T) for single-character tasks, and improved Frame-Accuracy and Character F1 for multi-character tasks, with notable storage efficiency. They also provide extensive ablations, a user study, and release code and datasets to facilitate further research. Overall, StoryWeaver demonstrates that structured knowledge representations coupled with spatially guided attention can unify and enhance both identity fidelity and semantic alignment in story visualization.

Abstract

Story visualization has gained increasing attention in artificial intelligence. However, existing methods still struggle with maintaining a balance between character identity preservation and text-semantics alignment, largely due to a lack of detailed semantic modeling of the story scene. To tackle this challenge, we propose a novel knowledge graph, namely Character Graph (\textbf{CG}), which comprehensively represents various story-related knowledge, including the characters, the attributes related to characters, and the relationship between characters. We then introduce StoryWeaver, an image generator that achieve Customization via Character Graph (\textbf{C-CG}), capable of consistent story visualization with rich text semantics. To further improve the multi-character generation performance, we incorporate knowledge-enhanced spatial guidance (\textbf{KE-SG}) into StoryWeaver to precisely inject character semantics into generation. To validate the effectiveness of our proposed method, extensive experiments are conducted using a new benchmark called TBC-Bench. The experiments confirm that our StoryWeaver excels not only in creating vivid visual story plots but also in accurately conveying character identities across various scenarios with considerable storage efficiency, \emph{e.g.}, achieving an average increase of +9.03\% DINO-I and +13.44\% CLIP-T. Furthermore, ablation experiments are conducted to verify the superiority of the proposed module. Codes and datasets are released at https://github.com/Aria-Zhangjl/StoryWeaver.

StoryWeaver: A Unified World Model for Knowledge-Enhanced Story Character Customization

TL;DR

StoryWeaver tackles knowledge-rich story visualization by introducing a Character Graph (CG) to encode fine-grained semantic knowledge (characters, attributes, events) and a customization pipeline (C-CG) that converts CG into scene captions for a diffusion model. To address identity blending in multi-character generation, it adds Knowledge-Enhanced Spatial Guidance (KE-SG) that adjusts cross-attention maps using spatial priors derived from CG knowledge. The method is validated on a new TBC-Bench benchmark built around Pororo and Frozen characters, showing substantial gains in identity preservation (DINO-I) and text-semantic alignment (CLIP-T) for single-character tasks, and improved Frame-Accuracy and Character F1 for multi-character tasks, with notable storage efficiency. They also provide extensive ablations, a user study, and release code and datasets to facilitate further research. Overall, StoryWeaver demonstrates that structured knowledge representations coupled with spatially guided attention can unify and enhance both identity fidelity and semantic alignment in story visualization.

Abstract

Story visualization has gained increasing attention in artificial intelligence. However, existing methods still struggle with maintaining a balance between character identity preservation and text-semantics alignment, largely due to a lack of detailed semantic modeling of the story scene. To tackle this challenge, we propose a novel knowledge graph, namely Character Graph (\textbf{CG}), which comprehensively represents various story-related knowledge, including the characters, the attributes related to characters, and the relationship between characters. We then introduce StoryWeaver, an image generator that achieve Customization via Character Graph (\textbf{C-CG}), capable of consistent story visualization with rich text semantics. To further improve the multi-character generation performance, we incorporate knowledge-enhanced spatial guidance (\textbf{KE-SG}) into StoryWeaver to precisely inject character semantics into generation. To validate the effectiveness of our proposed method, extensive experiments are conducted using a new benchmark called TBC-Bench. The experiments confirm that our StoryWeaver excels not only in creating vivid visual story plots but also in accurately conveying character identities across various scenarios with considerable storage efficiency, \emph{e.g.}, achieving an average increase of +9.03\% DINO-I and +13.44\% CLIP-T. Furthermore, ablation experiments are conducted to verify the superiority of the proposed module. Codes and datasets are released at https://github.com/Aria-Zhangjl/StoryWeaver.

Paper Structure

This paper contains 53 sections, 21 equations, 12 figures, 6 tables.

Figures (12)

  • Figure 1: Our StoryWeaver can achieve high-quality story visualization based on the given characters within a unified model.
  • Figure 2: The overall framework of StoryWeaver. (a). We propose Character-Graph to represent semantic-rich knowledge within the story world. (b). We enhanced the StoryWeaver with proposed spatial guidance for further improving the performance of mult-character generations.
  • Figure 3: Visual examples for the impact of Customization via Character-Graph (C-CG) and Knowledge-Enhanced Spatial Guidance (KE-SG). (a).Without C-CG, the generator struggles to capture finer-grained details of character. (b).Without KE-SG, the generator tends to allocate attention uniformly across all regions, resulting in identity blending.
  • Figure 4: The visual comparisons of different methods on single and multi-character visual storytelling. Our StoryWeaver excels in character identity customization and well-matched semantic alignment.
  • Figure 5: The current datasets PororoSV used for story visualization faces challenges like low resolution, blurry training samples and caption annotations. These factors combined impede the model from achieving high-quality character customization.
  • ...and 7 more figures