ID.8: Co-Creating Visual Stories with Generative AI
Victor Nikhil Antony, Chien-Ming Huang
TL;DR
This paper presents ID.8, an open-source, end-to-end visual story authoring system that unifies text, visuals, and audio generation through a human-in-the-loop workflow. By coordinating a Storyline Creator with Leela (an LLM), a Storyboard, and a Scene Editor that leverages Stable Diffusion, AudioGen, and MusicGen, the authors demonstrate how multimodal generative AI can support iterative, co-creative storytelling. Two user studies reveal generally positive usability and creative exploration, but also highlight gaps in immersion, alignment, and perceived collaboration, guiding design improvements. The work contributes not only an operational platform but also design guidelines for future multimodal, co-creative systems and emphasizes the importance of user-friendly prompting, safety, and cohesive AI identity in human-AI collaboration.
Abstract
Storytelling is an integral part of human culture and significantly impacts cognitive and socio-emotional development and connection. Despite the importance of interactive visual storytelling, the process of creating such content requires specialized skills and is labor-intensive. This paper introduces ID.8, an open-source system designed for the co-creation of visual stories with generative AI. We focus on enabling an inclusive storytelling experience by simplifying the content creation process and allowing for customization. Our user evaluation confirms a generally positive user experience in domains such as enjoyment and exploration, while highlighting areas for improvement, particularly in immersiveness, alignment, and partnership between the user and the AI system. Overall, our findings indicate promising possibilities for empowering people to create visual stories with generative AI. This work contributes a novel content authoring system, ID.8, and insights into the challenges and potential of using generative AI for multimedia content creation.
