Whose story is it? Personalizing story generation by inferring author styles
Nischal Ashok Kumar, Chau Minh Pham, Mohit Iyyer, Andrew Lan
TL;DR
This work introduces Mythos, a multi-source dataset of 3.6k stories from 112 authors to study personalization in long-form story generation. It proposes a two-stage Pipeline centered on an Author Writing Sheet that captures implicit author characteristics across four narrative dimensions (Plot, Creativity, Development, Language Use) and guides LLM-based persona and rule-driven generation. Systematic automatic and human evaluations show that personalization using the Author Writing Sheet (and its Summ variant) outperforms non-personalized baselines in faithfulness to an author's history and similarity to ground-truth author stories, with particularly strong gains on Reddit and in Creativity/Language Use. An Oracle upper bound reveals the remaining gap in achieving truly personalized storytelling, highlighting the challenge of long-form author-specific generation and suggesting future work on expanding profiling coverage, multi-agent narrative systems, and cross-model generalization. Overall, the work demonstrates that structured, interpretable author profiling can meaningfully tailor long-form storytelling, with potential impact on education, writing assistance, and human-AI co-creation.
Abstract
Personalization is critical for improving user experience in interactive writing and educational applications, yet remains understudied in story generation. We study the task of personalizing story generation, where our goal is to mimic an author's writing style, given other stories written by them. We collect Mythos, a dataset of 3.6k stories from 112 authors, with an average of 16 stories per author, across five distinct sources reflecting diverse story-writing settings. We propose a two-stage pipeline for personalized story generation: first, we infer authors' implicit writing characteristics and organize them into an Author Writing Sheet, which is validated by humans to be of high quality; second, we simulate the author's persona using tailored persona descriptions and personalized story rules. We find that stories personalized using the Author Writing Sheet outperform a non-personalized baseline, achieving a 78% win-rate in capturing authors' past style and 59% in similarity to ground-truth author stories. Human evaluation supports these findings and further highlights trends, such as Reddit stories being easier to personalize, and the Creativity and Language Use aspects of stories being easier to personalize than the Plot.
