Table of Contents
Fetching ...

Imagining from Images with an AI Storytelling Tool

Edirlei Soares de Lima, Marco A. Casanova, Antonio L. Furtado

TL;DR

This paper tackles automated storytelling from visual content by introducing ImageTeller, a prototype that leverages GPT-4o Vision and a Stable Diffusion XL-based illustrator to convert images and sequences into narrated chapters with illustrations. The authors implement a multi-agent pipeline (Visual Analyzer, Storywriter, Illustrator) coordinated by a Plot Manager, and support both story-driven and data-driven narratives across five genres or a no-genre option. They demonstrate capabilities through diverse experiments, including Arthurian-inspired narratives, multilingual outputs, and data storytelling from global plastic pollution charts. The work showcases a practical pathway to image-to-narrative systems with interactive user control and sets the stage for broader applications in education, entertainment, and visual humanities.

Abstract

A method for generating narratives by analyzing single images or image sequences is presented, inspired by the time immemorial tradition of Narrative Art. The proposed method explores the multimodal capabilities of GPT-4o to interpret visual content and create engaging stories, which are illustrated by a Stable Diffusion XL model. The method is supported by a fully implemented tool, called ImageTeller, which accepts images from diverse sources as input. Users can guide the narrative's development according to the conventions of fundamental genres - such as Comedy, Romance, Tragedy, Satire or Mystery -, opt to generate data-driven stories, or to leave the prototype free to decide how to handle the narrative structure. User interaction is provided along the generation process, allowing the user to request alternative chapters or illustrations, and even reject and restart the story generation based on the same input. Additionally, users can attach captions to the input images, influencing the system's interpretation of the visual content. Examples of generated stories are provided, along with details on how to access the prototype.

Imagining from Images with an AI Storytelling Tool

TL;DR

This paper tackles automated storytelling from visual content by introducing ImageTeller, a prototype that leverages GPT-4o Vision and a Stable Diffusion XL-based illustrator to convert images and sequences into narrated chapters with illustrations. The authors implement a multi-agent pipeline (Visual Analyzer, Storywriter, Illustrator) coordinated by a Plot Manager, and support both story-driven and data-driven narratives across five genres or a no-genre option. They demonstrate capabilities through diverse experiments, including Arthurian-inspired narratives, multilingual outputs, and data storytelling from global plastic pollution charts. The work showcases a practical pathway to image-to-narrative systems with interactive user control and sets the stage for broader applications in education, entertainment, and visual humanities.

Abstract

A method for generating narratives by analyzing single images or image sequences is presented, inspired by the time immemorial tradition of Narrative Art. The proposed method explores the multimodal capabilities of GPT-4o to interpret visual content and create engaging stories, which are illustrated by a Stable Diffusion XL model. The method is supported by a fully implemented tool, called ImageTeller, which accepts images from diverse sources as input. Users can guide the narrative's development according to the conventions of fundamental genres - such as Comedy, Romance, Tragedy, Satire or Mystery -, opt to generate data-driven stories, or to leave the prototype free to decide how to handle the narrative structure. User interaction is provided along the generation process, allowing the user to request alternative chapters or illustrations, and even reject and restart the story generation based on the same input. Additionally, users can attach captions to the input images, influencing the system's interpretation of the visual content. Examples of generated stories are provided, along with details on how to access the prototype.
Paper Structure (10 sections, 3 equations, 8 figures, 2 tables)

This paper contains 10 sections, 3 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: First tryst of Guinevere and Lancelot, arranged and observed by Galehaut lancelot1470.
  • Figure 2: The user interface of ImageTeller.
  • Figure 3: The multi-AI-agent architecture of ImageTeller.
  • Figure 4: Illustration generated by the Juggernaut XL model for Prompt \ref{['prompt4']}.
  • Figure 5: Hägar and Helga browne19.
  • ...and 3 more figures