Detecting Mode Collapse in Language Models via Narration

Sil Hamilton

Detecting Mode Collapse in Language Models via Narration

Sil Hamilton

TL;DR

This work probes whether aligned language models retain the ability to enact multiple implied authors in narrative prompts. Using 4,374 stories produced by three OpenAI models with varying alignment, the study employs BERTopic-based topic analysis to detect virtual author signals and reveals increasing mode collapse with more extensive alignment, notably in gpt-3.5-turbo. The findings suggest alignment strategies may erode the capacity to generalize across authorial perspectives, with important implications for sociotechnical simulations and responsible deployment. The paper calls for replication, broader model testing, and a shift toward open-weight models to support reproducible, diverse narrative generation research.

Abstract

No two authors write alike. Personal flourishes invoked in written narratives, from lexicon to rhetorical devices, imply a particular author--what literary theorists label the implied or virtual author; distinct from the real author or narrator of a text. Early large language models trained on unfiltered training sets drawn from a variety of discordant sources yielded incoherent personalities, problematic for conversational tasks but proving useful for sampling literature from multiple perspectives. Successes in alignment research in recent years have allowed researchers to impose subjectively consistent personae on language models via instruction tuning and reinforcement learning from human feedback (RLHF), but whether aligned models retain the ability to model an arbitrary virtual author has received little scrutiny. By studying 4,374 stories sampled from three OpenAI language models, we show successive versions of GPT-3 suffer from increasing degrees of "mode collapse" whereby overfitting the model during alignment constrains it from generalizing over authorship: models suffering from mode collapse become unable to assume a multiplicity of perspectives. Our method and results are significant for researchers seeking to employ language models in sociological simulations.

Detecting Mode Collapse in Language Models via Narration

TL;DR

Abstract

Detecting Mode Collapse in Language Models via Narration

Authors

TL;DR

Abstract

Table of Contents

Figures (2)