Table of Contents
Fetching ...

Collaborative Storytelling with Large-scale Neural Language Models

Eric Nichols, Leo Gao, Randy Gomez

TL;DR

This work defines collaborative storytelling as a human-AI turn-taking task and builds a system that tunes a GPT-2‑large on storytelling data, coupled with a sample-and-rank pipeline to select high-quality continuations. The Generator (tuned on WritingPrompts) proposes multiple options, while the Ranker scores them to choose the best, improving alignment with human preferences. Evaluation shows that both tuning and ranking substantially boost continuation acceptability and human-like engagement, with the tuned+ranked setup preferred in qualitative assessments. The study advances interactive storytelling by integrating large-scale language modeling with ranking-based decision making and points to future work on controllability, genre/mood targeting, and agent-based deployment.

Abstract

Storytelling plays a central role in human socializing and entertainment. However, much of the research on automatic storytelling generation assumes that stories will be generated by an agent without any human interaction. In this paper, we introduce the task of collaborative storytelling, where an artificial intelligence agent and a person collaborate to create a unique story by taking turns adding to it. We present a collaborative storytelling system which works with a human storyteller to create a story by generating new utterances based on the story so far. We constructed the storytelling system by tuning a publicly-available large scale language model on a dataset of writing prompts and their accompanying fictional works. We identify generating sufficiently human-like utterances to be an important technical issue and propose a sample-and-rank approach to improve utterance quality. Quantitative evaluation shows that our approach outperforms a baseline, and we present qualitative evaluation of our system's capabilities.

Collaborative Storytelling with Large-scale Neural Language Models

TL;DR

This work defines collaborative storytelling as a human-AI turn-taking task and builds a system that tunes a GPT-2‑large on storytelling data, coupled with a sample-and-rank pipeline to select high-quality continuations. The Generator (tuned on WritingPrompts) proposes multiple options, while the Ranker scores them to choose the best, improving alignment with human preferences. Evaluation shows that both tuning and ranking substantially boost continuation acceptability and human-like engagement, with the tuned+ranked setup preferred in qualitative assessments. The study advances interactive storytelling by integrating large-scale language modeling with ranking-based decision making and points to future work on controllability, genre/mood targeting, and agent-based deployment.

Abstract

Storytelling plays a central role in human socializing and entertainment. However, much of the research on automatic storytelling generation assumes that stories will be generated by an agent without any human interaction. In this paper, we introduce the task of collaborative storytelling, where an artificial intelligence agent and a person collaborate to create a unique story by taking turns adding to it. We present a collaborative storytelling system which works with a human storyteller to create a story by generating new utterances based on the story so far. We constructed the storytelling system by tuning a publicly-available large scale language model on a dataset of writing prompts and their accompanying fictional works. We identify generating sufficiently human-like utterances to be an important technical issue and propose a sample-and-rank approach to improve utterance quality. Quantitative evaluation shows that our approach outperforms a baseline, and we present qualitative evaluation of our system's capabilities.

Paper Structure

This paper contains 22 sections, 1 equation, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Collaborative storytelling with an AI agent.
  • Figure 2: The ranking system architecture.
  • Figure 3: Web interface for collaborative storytelling annotation task. Participants select from amongst ten possible story continuations generated by the system before adding their own line to the story.
  • Figure 4: Web interface for storytelling system preference evaluation.
  • Figure 5: Human evaluation of collaborative storytelling systems. We compare the pairs ( untuned, tuned) and ( tuned, tuned+ranking). Each bar graph shows a comparison of two different systems generating stories through self chat. A larger portion of the bar indicates that system was preferred by evaluators.