Table of Contents
Fetching ...

Experimental Narratives: A Comparison of Human Crowdsourced Storytelling and AI Storytelling

Nina Begus

TL;DR

The proposed framework argues that fiction can be used as a window into human and AI-based collective imaginary and social dimensions and offers a direct and controlled comparison between human and LLM-generated storytelling.

Abstract

The paper proposes a framework that combines behavioral and computational experiments employing fictional prompts as a novel tool for investigating cultural artifacts and social biases in storytelling both by humans and generative AI. The study analyzes 250 stories authored by crowdworkers in June 2019 and 80 stories generated by GPT-3.5 and GPT-4 in March 2023 by merging methods from narratology and inferential statistics. Both crowdworkers and large language models responded to identical prompts about creating and falling in love with an artificial human. The proposed experimental paradigm allows a direct and controlled comparison between human and LLM-generated storytelling. Responses to the Pygmalionesque prompts confirm the pervasive presence of the Pygmalion myth in the collective imaginary of both humans and large language models. All solicited narratives present a scientific or technological pursuit. The analysis reveals that narratives from GPT-3.5 and particularly GPT-4 are more progressive in terms of gender roles and sexuality than those written by humans. While AI narratives with default settings and no additional prompting can occasionally provide innovative plot twists, they offer less imaginative scenarios and rhetoric than human-authored texts. The proposed framework argues that fiction can be used as a window into human and AI-based collective imaginary and social dimensions.

Experimental Narratives: A Comparison of Human Crowdsourced Storytelling and AI Storytelling

TL;DR

The proposed framework argues that fiction can be used as a window into human and AI-based collective imaginary and social dimensions and offers a direct and controlled comparison between human and LLM-generated storytelling.

Abstract

The paper proposes a framework that combines behavioral and computational experiments employing fictional prompts as a novel tool for investigating cultural artifacts and social biases in storytelling both by humans and generative AI. The study analyzes 250 stories authored by crowdworkers in June 2019 and 80 stories generated by GPT-3.5 and GPT-4 in March 2023 by merging methods from narratology and inferential statistics. Both crowdworkers and large language models responded to identical prompts about creating and falling in love with an artificial human. The proposed experimental paradigm allows a direct and controlled comparison between human and LLM-generated storytelling. Responses to the Pygmalionesque prompts confirm the pervasive presence of the Pygmalion myth in the collective imaginary of both humans and large language models. All solicited narratives present a scientific or technological pursuit. The analysis reveals that narratives from GPT-3.5 and particularly GPT-4 are more progressive in terms of gender roles and sexuality than those written by humans. While AI narratives with default settings and no additional prompting can occasionally provide innovative plot twists, they offer less imaginative scenarios and rhetoric than human-authored texts. The proposed framework argues that fiction can be used as a window into human and AI-based collective imaginary and social dimensions.
Paper Structure (35 sections, 5 figures)

This paper contains 35 sections, 5 figures.

Figures (5)

  • Figure 1: Distribution of participant demographics. By age (a), gender (b), education (c), and race (d).
  • Figure 2: Regression estimates from the model in Table A1 in the Appendix. Raw counts are given in Figure A1 in the Appendix.
  • Figure 3: Gender distribution of the human characters in both prompts. Creator/lover (a), creator (b), and lover (c).
  • Figure 4: Comparison between female artificial humans (left) and male artificial humans. Since the percentage of female characters was 68.8% and the percentage of male characters was 26.8%, the minimum frequency for words describing female artificial humans was 5 and the minimum frequency describing male artificial humans was 2.
  • Figure 5: Regression estimates for the predictors. a Prompt, b participant education, c gender, and d age. All estimates are given in Table A1 in the Appendix. Raw counts are given in Figure A1 in the Appendix.