Table of Contents
Fetching ...

Persistent Story World Simulation with Continuous Character Customization

Jinlu Zhang, Qiyun Wang, Baoxiang Du, Jiayi Ji, Jing He, Rongsheng Zhang, Tangjie Lv, Xiaoshuai Sun, Rongrong Ji

Abstract

Story visualization has gained increasing attention in computer vision. However, current methods often fail to achieve a synergy between accurate character customization, semantic alignment, and continuous integration of new identities. To tackle this challenge, in this paper we present EverTale, a story world simulator for continuous story character customization. We first propose an All-in-One-World Character Integrator to achieve continuous character adaptation within unified LoRA module, eliminating the need for per-character optimization modules of previous methods. Then, we incorporate a Character Quality Gate via MLLM-as-Judge to ensure the fidelity of each character adaptation process through chain-of-thought reasoning, determining whether the model can proceed to the next character or require additional training on the current one. We also introduce a Character-Aware Region-Focus Sampling strategy to address the identity degradation and layout conflicts in existing multi-character visual storytelling, ensuring natural multi-character generation by harmonizing local character-specific details with global scene context with higher efficiency. Experimental results show that our EverTale achieves superior performance against a wider range of compared methods on both single- and multi-character story visualization. Codes will be available.

Persistent Story World Simulation with Continuous Character Customization

Abstract

Story visualization has gained increasing attention in computer vision. However, current methods often fail to achieve a synergy between accurate character customization, semantic alignment, and continuous integration of new identities. To tackle this challenge, in this paper we present EverTale, a story world simulator for continuous story character customization. We first propose an All-in-One-World Character Integrator to achieve continuous character adaptation within unified LoRA module, eliminating the need for per-character optimization modules of previous methods. Then, we incorporate a Character Quality Gate via MLLM-as-Judge to ensure the fidelity of each character adaptation process through chain-of-thought reasoning, determining whether the model can proceed to the next character or require additional training on the current one. We also introduce a Character-Aware Region-Focus Sampling strategy to address the identity degradation and layout conflicts in existing multi-character visual storytelling, ensuring natural multi-character generation by harmonizing local character-specific details with global scene context with higher efficiency. Experimental results show that our EverTale achieves superior performance against a wider range of compared methods on both single- and multi-character story visualization. Codes will be available.
Paper Structure (53 sections, 10 equations, 21 figures, 5 tables)

This paper contains 53 sections, 10 equations, 21 figures, 5 tables.

Figures (21)

  • Figure 1: EverTale achieves efficient adaptation to new characters effectively preserving the identity of learned ones, e.g., the high-fidelity preservation of V2 when proceeding to V6 on Pororo ((b).$1^{st}$ row) and of V2 when integrating V4 on Frozen ((b).$2^{nd}$ row).
  • Figure 2: The overall framework of EverTale. (a).New characters can be continually learned using the All-in-One World Character Integrator. (b).Character Quality Gate ensures per-character customization performance. (c).Character-Aware Region-Focused Sampling strategy achieve accurate multi-character story visualization.
  • Figure 3: Visualization comparison for single and multiple character, where the arrows indicate the character learning sequence for CL-based methods. Our method excels in character identity preservation and text-semantic alignment in both story visualization situations.
  • Figure 4: The ablation visualizations of the effectiveness of different designs in EverTale.
  • Figure 5: The complete evaluation prompt generation instruction $T_e$ used in the Character Quality Gate via MLLM-as-Judge for guiding the multimodal large language model (MLLM) in generating a suitable evaluation prompt for EverTale.
  • ...and 16 more figures