Table of Contents
Fetching ...

ID.8: Co-Creating Visual Stories with Generative AI

Victor Nikhil Antony, Chien-Ming Huang

TL;DR

This paper presents ID.8, an open-source, end-to-end visual story authoring system that unifies text, visuals, and audio generation through a human-in-the-loop workflow. By coordinating a Storyline Creator with Leela (an LLM), a Storyboard, and a Scene Editor that leverages Stable Diffusion, AudioGen, and MusicGen, the authors demonstrate how multimodal generative AI can support iterative, co-creative storytelling. Two user studies reveal generally positive usability and creative exploration, but also highlight gaps in immersion, alignment, and perceived collaboration, guiding design improvements. The work contributes not only an operational platform but also design guidelines for future multimodal, co-creative systems and emphasizes the importance of user-friendly prompting, safety, and cohesive AI identity in human-AI collaboration.

Abstract

Storytelling is an integral part of human culture and significantly impacts cognitive and socio-emotional development and connection. Despite the importance of interactive visual storytelling, the process of creating such content requires specialized skills and is labor-intensive. This paper introduces ID.8, an open-source system designed for the co-creation of visual stories with generative AI. We focus on enabling an inclusive storytelling experience by simplifying the content creation process and allowing for customization. Our user evaluation confirms a generally positive user experience in domains such as enjoyment and exploration, while highlighting areas for improvement, particularly in immersiveness, alignment, and partnership between the user and the AI system. Overall, our findings indicate promising possibilities for empowering people to create visual stories with generative AI. This work contributes a novel content authoring system, ID.8, and insights into the challenges and potential of using generative AI for multimedia content creation.

ID.8: Co-Creating Visual Stories with Generative AI

TL;DR

This paper presents ID.8, an open-source, end-to-end visual story authoring system that unifies text, visuals, and audio generation through a human-in-the-loop workflow. By coordinating a Storyline Creator with Leela (an LLM), a Storyboard, and a Scene Editor that leverages Stable Diffusion, AudioGen, and MusicGen, the authors demonstrate how multimodal generative AI can support iterative, co-creative storytelling. Two user studies reveal generally positive usability and creative exploration, but also highlight gaps in immersion, alignment, and perceived collaboration, guiding design improvements. The work contributes not only an operational platform but also design guidelines for future multimodal, co-creative systems and emphasizes the importance of user-friendly prompting, safety, and cohesive AI identity in human-AI collaboration.

Abstract

Storytelling is an integral part of human culture and significantly impacts cognitive and socio-emotional development and connection. Despite the importance of interactive visual storytelling, the process of creating such content requires specialized skills and is labor-intensive. This paper introduces ID.8, an open-source system designed for the co-creation of visual stories with generative AI. We focus on enabling an inclusive storytelling experience by simplifying the content creation process and allowing for customization. Our user evaluation confirms a generally positive user experience in domains such as enjoyment and exploration, while highlighting areas for improvement, particularly in immersiveness, alignment, and partnership between the user and the AI system. Overall, our findings indicate promising possibilities for empowering people to create visual stories with generative AI. This work contributes a novel content authoring system, ID.8, and insights into the challenges and potential of using generative AI for multimedia content creation.
Paper Structure (41 sections, 7 figures, 3 tables)

This paper contains 41 sections, 7 figures, 3 tables.

Figures (7)

  • Figure 1: ID.8 enables generation of a story (1) by collaborating with ChatGPT, and also allows the user (2) to manually edit the story and then (3) generates---using ChatGPT---a structured script and pre-populates the storyboard with scenes from the script.
  • Figure 2: ID.8 Storyboard allows for organization of the story flow by linking scenes and specifying how story viewer inputs should impact the flow of the story. Users access the Scene Editor module by double-clicking a scene node. Users can also preview their story.
  • Figure 3: (1)The ID.8 Scene Editor enables creation of prompts for text-to-image/audio models in collaboration with ChatGPT; (2) For character generation, ID.8 empowers users to select parts of the generated output to be used in the story; (3) ID.8 provides a simple interface for adding interaction with viewer.
  • Figure 4: Results from Study 1: (a) SUS Scores, (b) MICSI Sub-Scale Scores, (c) Exploratory Question Responses.
  • Figure 5: Scenes from stories generated by participants using ID.8 in Study 1 and Study 2.
  • ...and 2 more figures