Table of Contents
Fetching ...

MirrorStories: Reflecting Diversity through Personalized Narrative Generation with Large Language Models

Sarfaroz Yunusov, Hamza Sidat, Ali Emami

TL;DR

MirrorStories demonstrates that large language models can generate personalized narratives that reflect diverse reader identities, outperforming generic narratives in engagement and personal relevance without sacrificing core morals. By constructing a 1,500-story corpus with explicit identity traits and morals, and evaluating with 26 diverse evaluators plus GPT-4, the work shows that personalization increases textual diversity and reader connection while maintaining comprehension. The study also investigates biases in model evaluations, explores multimodal extension with image generation, and provides a public web app to encourage further research. Limitations include evaluator demographics, scope of personalization, and model variety, which point to valuable future work in broader populations and model ecosystems.

Abstract

This study explores the effectiveness of Large Language Models (LLMs) in creating personalized "mirror stories" that reflect and resonate with individual readers' identities, addressing the significant lack of diversity in literature. We present MirrorStories, a corpus of 1,500 personalized short stories generated by integrating elements such as name, gender, age, ethnicity, reader interest, and story moral. We demonstrate that LLMs can effectively incorporate diverse identity elements into narratives, with human evaluators identifying personalized elements in the stories with high accuracy. Through a comprehensive evaluation involving 26 diverse human judges, we compare the effectiveness of MirrorStories against generic narratives. We find that personalized LLM-generated stories not only outscore generic human-written and LLM-generated ones across all metrics of engagement (with average ratings of 4.22 versus 3.37 on a 5-point scale), but also achieve higher textual diversity while preserving the intended moral. We also provide analyses that include bias assessments and a study on the potential for integrating images into personalized stories.

MirrorStories: Reflecting Diversity through Personalized Narrative Generation with Large Language Models

TL;DR

MirrorStories demonstrates that large language models can generate personalized narratives that reflect diverse reader identities, outperforming generic narratives in engagement and personal relevance without sacrificing core morals. By constructing a 1,500-story corpus with explicit identity traits and morals, and evaluating with 26 diverse evaluators plus GPT-4, the work shows that personalization increases textual diversity and reader connection while maintaining comprehension. The study also investigates biases in model evaluations, explores multimodal extension with image generation, and provides a public web app to encourage further research. Limitations include evaluator demographics, scope of personalization, and model variety, which point to valuable future work in broader populations and model ecosystems.

Abstract

This study explores the effectiveness of Large Language Models (LLMs) in creating personalized "mirror stories" that reflect and resonate with individual readers' identities, addressing the significant lack of diversity in literature. We present MirrorStories, a corpus of 1,500 personalized short stories generated by integrating elements such as name, gender, age, ethnicity, reader interest, and story moral. We demonstrate that LLMs can effectively incorporate diverse identity elements into narratives, with human evaluators identifying personalized elements in the stories with high accuracy. Through a comprehensive evaluation involving 26 diverse human judges, we compare the effectiveness of MirrorStories against generic narratives. We find that personalized LLM-generated stories not only outscore generic human-written and LLM-generated ones across all metrics of engagement (with average ratings of 4.22 versus 3.37 on a 5-point scale), but also achieve higher textual diversity while preserving the intended moral. We also provide analyses that include bias assessments and a study on the potential for integrating images into personalized stories.
Paper Structure (41 sections, 19 figures, 7 tables)

This paper contains 41 sections, 19 figures, 7 tables.

Figures (19)

  • Figure 1: Generation and evaluation process for human-written generic, LLM-generated generic, and LLM-generated personalized narratives
  • Figure 2: Illustration demonstrating the personalization validation and impact processes
  • Figure 3: Accuracy of human and LLM evaluators in identifying identity elements in the story
  • Figure 4: Comparative evaluation of narrative types by human and GPT-4 evaluators across different metrics
  • Figure 5: Average ratings by GPT-4 across gender
  • ...and 14 more figures