Table of Contents
Fetching ...

Assessing Language Models' Worldview for Fiction Generation

Aisha Khatun, Daniel G. Brown

TL;DR

This paper investigates whether large language models can maintain a fixed internal world state necessary for coherent fiction generation. It first assesses consistency and robustness of nine LLMs using a set of Yes/No prompts about truth versus fiction, then generates 500-word stories from four models based on 20 statements to analyze narrative patterns. The findings show that most models are self-conflicting and largely fail to sustain a manipulable world state, with only zephyr-7b-alpha and GPT-4 Turbo exhibiting relatively stronger consistency. The generated stories exhibit uniform, template-like patterns across models, indicating a lack of true world-building capability. The work highlights the current limitations of vanilla LLMs for fiction and suggests future directions toward explicit state maintenance or targeted fine-tuning to enable reliable story-world creation.

Abstract

The use of Large Language Models (LLMs) has become ubiquitous, with abundant applications in computational creativity. One such application is fictional story generation. Fiction is a narrative that occurs in a story world that is slightly different than ours. With LLMs becoming writing partners, we question how suitable they are to generate fiction. This study investigates the ability of LLMs to maintain a state of world essential to generate fiction. Through a series of questions to nine LLMs, we find that only two models exhibit consistent worldview, while the rest are self-conflicting. Subsequent analysis of stories generated by four models revealed a strikingly uniform narrative pattern. This uniformity across models further suggests a lack of `state' necessary for fiction. We highlight the limitations of current LLMs in fiction writing and advocate for future research to test and create story worlds for LLMs to reside in. All code, dataset, and the generated responses can be found in https://github.com/tanny411/llm-reliability-and-consistency-evaluation.

Assessing Language Models' Worldview for Fiction Generation

TL;DR

This paper investigates whether large language models can maintain a fixed internal world state necessary for coherent fiction generation. It first assesses consistency and robustness of nine LLMs using a set of Yes/No prompts about truth versus fiction, then generates 500-word stories from four models based on 20 statements to analyze narrative patterns. The findings show that most models are self-conflicting and largely fail to sustain a manipulable world state, with only zephyr-7b-alpha and GPT-4 Turbo exhibiting relatively stronger consistency. The generated stories exhibit uniform, template-like patterns across models, indicating a lack of true world-building capability. The work highlights the current limitations of vanilla LLMs for fiction and suggests future directions toward explicit state maintenance or targeted fine-tuning to enable reliable story-world creation.

Abstract

The use of Large Language Models (LLMs) has become ubiquitous, with abundant applications in computational creativity. One such application is fictional story generation. Fiction is a narrative that occurs in a story world that is slightly different than ours. With LLMs becoming writing partners, we question how suitable they are to generate fiction. This study investigates the ability of LLMs to maintain a state of world essential to generate fiction. Through a series of questions to nine LLMs, we find that only two models exhibit consistent worldview, while the rest are self-conflicting. Subsequent analysis of stories generated by four models revealed a strikingly uniform narrative pattern. This uniformity across models further suggests a lack of `state' necessary for fiction. We highlight the limitations of current LLMs in fiction writing and advocate for future research to test and create story worlds for LLMs to reside in. All code, dataset, and the generated responses can be found in https://github.com/tanny411/llm-reliability-and-consistency-evaluation.
Paper Structure (27 sections, 2 tables)