Story3D-Agent: Exploring 3D Storytelling Visualization with Large Language Models
Yuzhou Huang, Yiran Qin, Shunlin Lu, Xintao Wang, Rui Huang, Ying Shan, Ruimao Zhang
TL;DR
Story3D-Agent presents a novel LLM-agent framework that translates narrative into 3D visualizations via procedural modeling. It introduces event-window decomposition, hierarchical action/motion/decoration libraries, and textual/visual self-check to ensure long-range narrative coherence in Blender-rendered scenes. The paper provides a detailed implementation including three Python libraries, evaluation criteria (Ins-Align, CLIP, ROUGE-L, BERT-based similarity), and ablation studies demonstrating benefits of the hierarchical, self-check-enabled design. The work advances 3D storytelling by enabling precise control of multi-character actions and decorations, and by supporting story continuation with logical reasoning.
Abstract
Traditional visual storytelling is complex, requiring specialized knowledge and substantial resources, yet often constrained by human creativity and creation precision. While Large Language Models (LLMs) enhance visual storytelling, current approaches often limit themselves to 2D visuals or oversimplify stories through motion synthesis and behavioral simulation, failing to create comprehensive, multi-dimensional narratives. To this end, we present Story3D-Agent, a pioneering approach that leverages the capabilities of LLMs to transform provided narratives into 3D-rendered visualizations. By integrating procedural modeling, our approach enables precise control over multi-character actions and motions, as well as diverse decorative elements, ensuring the long-range and dynamic 3D representation. Furthermore, our method supports narrative extension through logical reasoning, ensuring that generated content remains consistent with existing conditions. We have thoroughly evaluated our Story3D-Agent to validate its effectiveness, offering a basic framework to advance 3D story representation.
