An Exploratory Study on Multi-modal Generative AI in AR Storytelling
Hyungjun Doh, Jingyu Shi, Rahul Jain, Heesoo Kim, Karthik Ramani
TL;DR
This study defines a design-space for multi-modal Gen-AI in AR storytelling by analyzing 223 AR videos and building a testbed that supports five modalities (Text, Audio, Image, Video, 3D) and four atomic storytelling elements (Character, Background, Sentiment, Development). Through two studies with 30 experienced storytellers, it investigates modality preferences, interaction with AI, and the quality of AI-generated content, finding that images suit characters and backgrounds well while video supports development, though video quality often limits alignment with intent. Participants generally found co-creative AI interactions easy but noted that guiding outputs via prompts remains hard, underscoring the need for context-aware, selective augmentation and richer AR interactions. The work contributes a concrete design-space, a functional testbed leveraging Motion-Diffusion-Model, Text2Video-Zero, Stable Diffusion, MusicGen, and DreamFusion, and actionable design recommendations for future AR storytelling systems employing Gen-AI.
Abstract
Storytelling in AR has gained attention due to its multi-modality and interactivity. However, generating multi-modal content for AR storytelling requires expertise and efforts for high-quality conveyance of the narrator's intention. Recently, Generative-AI (GenAI) has shown promising applications in multi-modal content generation. Despite the potential benefit, current research calls for validating the effect of AI-generated content (AIGC) in AR Storytelling. Therefore, we conducted an exploratory study to investigate the utilization of GenAI. Analyzing 223 AR videos, we identified a design space for multi-modal AR Storytelling. Based on the design space, we developed a testbed facilitating multi-modal content generation and atomic elements in AR Storytelling. Through two studies with N=30 experienced storytellers and live presenters, we 1. revealed participants' preferences for modalities, 2. evaluated the interactions with AI to generate content, and 3. assessed the quality of the AIGC for AR Storytelling. We further discussed design considerations for future AR Storytelling with GenAI.
