Table of Contents
Fetching ...

Not All Personas Are Worth It: Culture-Reflective Persona Data Augmentation

Ji-Eun Han, Yoonseok Heo

TL;DR

The paper addresses the gap where existing persona datasets fail to capture cultural specificity, hindering culturally aware language models. It introduces a two-step pipeline—Persona Relevance Filtering and Culture-reflective Persona Editing—to transform a diverse pool of PersonasHub-derived data into culture-aligned Korean personas, resulting in KoPersona (200k). KoPersona comprises general and culture subsets, and is validated through both quantitative metrics (e.g., P-Acc, BLEU-2, Jaccard) and qualitative human-like evaluations, showing superior cultural alignment and competitive diversity compared to the baseline. The work offers a scalable framework for cross-cultural persona augmentation, enabling culturally aware model training and adaptation to other languages and contexts beyond Korean.

Abstract

Incorporating personas into conversational AI models is crucial for achieving authentic and engaging interactions. However, the cultural diversity and adaptability of existing persona datasets is often overlooked, reducing their efficacy in building culturally aware AI systems. To address this issue, we propose a two-step pipeline for generating culture-specific personas and introduce KoPersona, a dataset comprising 200,000 personas designed to capture Korean cultural values, behaviors, and social nuances. A comprehensive evaluation through various metrics validates the quality of KoPersona and its relevance to Korean culture. This work not only contributes to persona-based research, but also establishes a scalable approach for creating culturally relevant personas adaptable to various languages and cultural contexts.

Not All Personas Are Worth It: Culture-Reflective Persona Data Augmentation

TL;DR

The paper addresses the gap where existing persona datasets fail to capture cultural specificity, hindering culturally aware language models. It introduces a two-step pipeline—Persona Relevance Filtering and Culture-reflective Persona Editing—to transform a diverse pool of PersonasHub-derived data into culture-aligned Korean personas, resulting in KoPersona (200k). KoPersona comprises general and culture subsets, and is validated through both quantitative metrics (e.g., P-Acc, BLEU-2, Jaccard) and qualitative human-like evaluations, showing superior cultural alignment and competitive diversity compared to the baseline. The work offers a scalable framework for cross-cultural persona augmentation, enabling culturally aware model training and adaptation to other languages and contexts beyond Korean.

Abstract

Incorporating personas into conversational AI models is crucial for achieving authentic and engaging interactions. However, the cultural diversity and adaptability of existing persona datasets is often overlooked, reducing their efficacy in building culturally aware AI systems. To address this issue, we propose a two-step pipeline for generating culture-specific personas and introduce KoPersona, a dataset comprising 200,000 personas designed to capture Korean cultural values, behaviors, and social nuances. A comprehensive evaluation through various metrics validates the quality of KoPersona and its relevance to Korean culture. This work not only contributes to persona-based research, but also establishes a scalable approach for creating culturally relevant personas adaptable to various languages and cultural contexts.

Paper Structure

This paper contains 10 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: The process of generating culture-reflective personas by adapting a culturally irrelevant persona into a contextually appropriate one.
  • Figure 2: The overview of the suggesting pipeline