MirrorStories: Reflecting Diversity through Personalized Narrative Generation with Large Language Models

Sarfaroz Yunusov; Hamza Sidat; Ali Emami

MirrorStories: Reflecting Diversity through Personalized Narrative Generation with Large Language Models

Sarfaroz Yunusov, Hamza Sidat, Ali Emami

TL;DR

MirrorStories demonstrates that large language models can generate personalized narratives that reflect diverse reader identities, outperforming generic narratives in engagement and personal relevance without sacrificing core morals. By constructing a 1,500-story corpus with explicit identity traits and morals, and evaluating with 26 diverse evaluators plus GPT-4, the work shows that personalization increases textual diversity and reader connection while maintaining comprehension. The study also investigates biases in model evaluations, explores multimodal extension with image generation, and provides a public web app to encourage further research. Limitations include evaluator demographics, scope of personalization, and model variety, which point to valuable future work in broader populations and model ecosystems.

Abstract

This study explores the effectiveness of Large Language Models (LLMs) in creating personalized "mirror stories" that reflect and resonate with individual readers' identities, addressing the significant lack of diversity in literature. We present MirrorStories, a corpus of 1,500 personalized short stories generated by integrating elements such as name, gender, age, ethnicity, reader interest, and story moral. We demonstrate that LLMs can effectively incorporate diverse identity elements into narratives, with human evaluators identifying personalized elements in the stories with high accuracy. Through a comprehensive evaluation involving 26 diverse human judges, we compare the effectiveness of MirrorStories against generic narratives. We find that personalized LLM-generated stories not only outscore generic human-written and LLM-generated ones across all metrics of engagement (with average ratings of 4.22 versus 3.37 on a 5-point scale), but also achieve higher textual diversity while preserving the intended moral. We also provide analyses that include bias assessments and a study on the potential for integrating images into personalized stories.

MirrorStories: Reflecting Diversity through Personalized Narrative Generation with Large Language Models

TL;DR

Abstract

Paper Structure (41 sections, 19 figures, 7 tables)

This paper contains 41 sections, 19 figures, 7 tables.

Introduction
MirrorStories
Overview
Dataset Collection
Human-written Stories & Morals
Identities
Generic & Personalized LLM-Generated Stories
Experiments
Prompts
Human Evaluation
Models
Results
Are MirrorStories personalized?
Are MirrorStories preferred?
How does personalization affect moral comprehension?
...and 26 more sections

Figures (19)

Figure 1: Generation and evaluation process for human-written generic, LLM-generated generic, and LLM-generated personalized narratives
Figure 2: Illustration demonstrating the personalization validation and impact processes
Figure 3: Accuracy of human and LLM evaluators in identifying identity elements in the story
Figure 4: Comparative evaluation of narrative types by human and GPT-4 evaluators across different metrics
Figure 5: Average ratings by GPT-4 across gender
...and 14 more figures

MirrorStories: Reflecting Diversity through Personalized Narrative Generation with Large Language Models

TL;DR

Abstract

MirrorStories: Reflecting Diversity through Personalized Narrative Generation with Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (19)