Dialogue Director: Bridging the Gap in Dialogue Visualization for Multimodal Storytelling
Min Zhang, Zilin Wang, Liyan Chen, Kunhong Liu, Juncong Lin
TL;DR
Dialogue Director tackles the challenge of translating dialogue-centric scripts into coherent, cinema-grade storyboards by introducing a training-free, three-agent pipeline that fuses language reasoning with diffusion-based visual generation. The Script Director performs structured extraction from scripts using Chain-of-Thought prompting and Retrieval-Augmented Generation, the Cinematographer generates consistent multi-view character visuals, and the Storyboard Maker composes cinematic layouts that respect perspective and shot design. Across real-world scripts, the framework achieves superior image quality and text–image alignment (as shown by NIQE and CLIP-T) and strong human judgments on relationship, physical consistency, and cinematic knowledge, outperforming several state-of-the-art baselines. The approach is flexible and plug-and-play, enabling controllable, dialogue-driven storyboard production with improved narrative coherence and visual fidelity, while noting limitations in highly dynamic shots and complex poses for future work.
Abstract
Recent advances in AI-driven storytelling have enhanced video generation and story visualization. However, translating dialogue-centric scripts into coherent storyboards remains a significant challenge due to limited script detail, inadequate physical context understanding, and the complexity of integrating cinematic principles. To address these challenges, we propose Dialogue Visualization, a novel task that transforms dialogue scripts into dynamic, multi-view storyboards. We introduce Dialogue Director, a training-free multimodal framework comprising a Script Director, Cinematographer, and Storyboard Maker. This framework leverages large multimodal models and diffusion-based architectures, employing techniques such as Chain-of-Thought reasoning, Retrieval-Augmented Generation, and multi-view synthesis to improve script understanding, physical context comprehension, and cinematic knowledge integration. Experimental results demonstrate that Dialogue Director outperforms state-of-the-art methods in script interpretation, physical world understanding, and cinematic principle application, significantly advancing the quality and controllability of dialogue-based story visualization.
