Table of Contents
Fetching ...

Whose story is it? Personalizing story generation by inferring author styles

Nischal Ashok Kumar, Chau Minh Pham, Mohit Iyyer, Andrew Lan

TL;DR

This work introduces Mythos, a multi-source dataset of 3.6k stories from 112 authors to study personalization in long-form story generation. It proposes a two-stage Pipeline centered on an Author Writing Sheet that captures implicit author characteristics across four narrative dimensions (Plot, Creativity, Development, Language Use) and guides LLM-based persona and rule-driven generation. Systematic automatic and human evaluations show that personalization using the Author Writing Sheet (and its Summ variant) outperforms non-personalized baselines in faithfulness to an author's history and similarity to ground-truth author stories, with particularly strong gains on Reddit and in Creativity/Language Use. An Oracle upper bound reveals the remaining gap in achieving truly personalized storytelling, highlighting the challenge of long-form author-specific generation and suggesting future work on expanding profiling coverage, multi-agent narrative systems, and cross-model generalization. Overall, the work demonstrates that structured, interpretable author profiling can meaningfully tailor long-form storytelling, with potential impact on education, writing assistance, and human-AI co-creation.

Abstract

Personalization is critical for improving user experience in interactive writing and educational applications, yet remains understudied in story generation. We study the task of personalizing story generation, where our goal is to mimic an author's writing style, given other stories written by them. We collect Mythos, a dataset of 3.6k stories from 112 authors, with an average of 16 stories per author, across five distinct sources reflecting diverse story-writing settings. We propose a two-stage pipeline for personalized story generation: first, we infer authors' implicit writing characteristics and organize them into an Author Writing Sheet, which is validated by humans to be of high quality; second, we simulate the author's persona using tailored persona descriptions and personalized story rules. We find that stories personalized using the Author Writing Sheet outperform a non-personalized baseline, achieving a 78% win-rate in capturing authors' past style and 59% in similarity to ground-truth author stories. Human evaluation supports these findings and further highlights trends, such as Reddit stories being easier to personalize, and the Creativity and Language Use aspects of stories being easier to personalize than the Plot.

Whose story is it? Personalizing story generation by inferring author styles

TL;DR

This work introduces Mythos, a multi-source dataset of 3.6k stories from 112 authors to study personalization in long-form story generation. It proposes a two-stage Pipeline centered on an Author Writing Sheet that captures implicit author characteristics across four narrative dimensions (Plot, Creativity, Development, Language Use) and guides LLM-based persona and rule-driven generation. Systematic automatic and human evaluations show that personalization using the Author Writing Sheet (and its Summ variant) outperforms non-personalized baselines in faithfulness to an author's history and similarity to ground-truth author stories, with particularly strong gains on Reddit and in Creativity/Language Use. An Oracle upper bound reveals the remaining gap in achieving truly personalized storytelling, highlighting the challenge of long-form author-specific generation and suggesting future work on expanding profiling coverage, multi-agent narrative systems, and cross-model generalization. Overall, the work demonstrates that structured, interpretable author profiling can meaningfully tailor long-form storytelling, with potential impact on education, writing assistance, and human-AI co-creation.

Abstract

Personalization is critical for improving user experience in interactive writing and educational applications, yet remains understudied in story generation. We study the task of personalizing story generation, where our goal is to mimic an author's writing style, given other stories written by them. We collect Mythos, a dataset of 3.6k stories from 112 authors, with an average of 16 stories per author, across five distinct sources reflecting diverse story-writing settings. We propose a two-stage pipeline for personalized story generation: first, we infer authors' implicit writing characteristics and organize them into an Author Writing Sheet, which is validated by humans to be of high quality; second, we simulate the author's persona using tailored persona descriptions and personalized story rules. We find that stories personalized using the Author Writing Sheet outperform a non-personalized baseline, achieving a 78% win-rate in capturing authors' past style and 59% in similarity to ground-truth author stories. Human evaluation supports these findings and further highlights trends, such as Reddit stories being easier to personalize, and the Creativity and Language Use aspects of stories being easier to personalize than the Plot.

Paper Structure

This paper contains 102 sections, 23 figures, 28 tables, 1 algorithm.

Figures (23)

  • Figure 1: Our two-stage pipeline for personalized story generation. Stage 1 constructs an Author Writing Sheet with Claim ($C$) and Evidence ($E$) pairs capturing the author’s story-writing characteristics across narrative categories. It is derived from the author’s history of writing prompts ($wp$), author-written stories ($s_a$), and LLM-generated Average Stories ($s_b$) representing a typical author's response to the same prompt. Stage 2 uses the Author Writing Sheet to role-play the author, incorporating tailored story rules and a persona description for personalized generation.
  • Figure 2: Win-rates proportions across narrative categories for Similarity to Author Story for Reddit. O: Oracle, D: Delta, Sh: Sheet, Su: Summ.
  • Figure 3: Story-writing themes for the Reddit source covered in Mythos.
  • Figure 4: Prompt for generating writing prompts for the stories in our dataset.
  • Figure 5: Average Author Prompt for AO3.
  • ...and 18 more figures