Table of Contents
Fetching ...

Story3D-Agent: Exploring 3D Storytelling Visualization with Large Language Models

Yuzhou Huang, Yiran Qin, Shunlin Lu, Xintao Wang, Rui Huang, Ying Shan, Ruimao Zhang

TL;DR

Story3D-Agent presents a novel LLM-agent framework that translates narrative into 3D visualizations via procedural modeling. It introduces event-window decomposition, hierarchical action/motion/decoration libraries, and textual/visual self-check to ensure long-range narrative coherence in Blender-rendered scenes. The paper provides a detailed implementation including three Python libraries, evaluation criteria (Ins-Align, CLIP, ROUGE-L, BERT-based similarity), and ablation studies demonstrating benefits of the hierarchical, self-check-enabled design. The work advances 3D storytelling by enabling precise control of multi-character actions and decorations, and by supporting story continuation with logical reasoning.

Abstract

Traditional visual storytelling is complex, requiring specialized knowledge and substantial resources, yet often constrained by human creativity and creation precision. While Large Language Models (LLMs) enhance visual storytelling, current approaches often limit themselves to 2D visuals or oversimplify stories through motion synthesis and behavioral simulation, failing to create comprehensive, multi-dimensional narratives. To this end, we present Story3D-Agent, a pioneering approach that leverages the capabilities of LLMs to transform provided narratives into 3D-rendered visualizations. By integrating procedural modeling, our approach enables precise control over multi-character actions and motions, as well as diverse decorative elements, ensuring the long-range and dynamic 3D representation. Furthermore, our method supports narrative extension through logical reasoning, ensuring that generated content remains consistent with existing conditions. We have thoroughly evaluated our Story3D-Agent to validate its effectiveness, offering a basic framework to advance 3D story representation.

Story3D-Agent: Exploring 3D Storytelling Visualization with Large Language Models

TL;DR

Story3D-Agent presents a novel LLM-agent framework that translates narrative into 3D visualizations via procedural modeling. It introduces event-window decomposition, hierarchical action/motion/decoration libraries, and textual/visual self-check to ensure long-range narrative coherence in Blender-rendered scenes. The paper provides a detailed implementation including three Python libraries, evaluation criteria (Ins-Align, CLIP, ROUGE-L, BERT-based similarity), and ablation studies demonstrating benefits of the hierarchical, self-check-enabled design. The work advances 3D storytelling by enabling precise control of multi-character actions and decorations, and by supporting story continuation with logical reasoning.

Abstract

Traditional visual storytelling is complex, requiring specialized knowledge and substantial resources, yet often constrained by human creativity and creation precision. While Large Language Models (LLMs) enhance visual storytelling, current approaches often limit themselves to 2D visuals or oversimplify stories through motion synthesis and behavioral simulation, failing to create comprehensive, multi-dimensional narratives. To this end, we present Story3D-Agent, a pioneering approach that leverages the capabilities of LLMs to transform provided narratives into 3D-rendered visualizations. By integrating procedural modeling, our approach enables precise control over multi-character actions and motions, as well as diverse decorative elements, ensuring the long-range and dynamic 3D representation. Furthermore, our method supports narrative extension through logical reasoning, ensuring that generated content remains consistent with existing conditions. We have thoroughly evaluated our Story3D-Agent to validate its effectiveness, offering a basic framework to advance 3D story representation.
Paper Structure (43 sections, 13 figures, 4 tables)

This paper contains 43 sections, 13 figures, 4 tables.

Figures (13)

  • Figure 1: We present Story3D-Agent, an innovative LLM-agents system designed for 3D storytelling visualization. The primary objective of the LLM-agents system is to adeptly transform a provided narrative into a corresponding 3D visualization. In this figure, we illustrate the narrative titled Race Day, represented as a 3D-rendered representation.
  • Figure 2: Overviews of proposed (a) Story3D-Agent and (b) Visual Self-check workflow. Our method involves dividing a story into multiple parts, each serving as an event window. Using LLMs, we independently determine the corresponding storyline for each clip. These determinations are then compiled for the overall story model. Further, the accuracy of the system's determinations is improved by a multi-dimensional error correction mechanism.
  • Figure 3: The director, action, motion, and decoration agents are required to initially produce their respective outputs. Subsequently, these outputs are evaluated by the textual self-check mechanism. This mechanism not only confirms the correct responses but also initiates self-reflection and correction for potential errors. The process continues until all outputs are deemed error-free, at which point the determination process within the current event window is concluded.
  • Figure 4: We depict another vivid narrative titled Friendship provided by our Story3D-Agent, which narrates the story about the univeral theme of friendship in a pre-defined in garden-like scene.
  • Figure 5: We present the story continuation outcomes generated by our Story3D-Agent. Leveraging the rigorous logical reasoning capabilities of LLMs, the newly generated narratives could: 1) Preserve the coherence and consistency of the contextual story content. 2) Align with the stipulated conditions for the new narrative. 3) Implement all generated plots without introducing any misleading elements. The continuation results for the narratives Race Day and Friendship are provided for illustration.
  • ...and 8 more figures