Table of Contents
Fetching ...

PSYDIAL: Personality-based Synthetic Dialogue Generation using Large Language Models

Ji-Eun Han, Jun-Seok Koh, Hyeon-Tae Seo, Du-Seong Chang, Kyung-Ah Sohn

TL;DR

PSYDIAL introduces an end-to-end, personality-based synthetic dialogue data generation pipeline that leverages prompting of Large Language Models to elicit personality-consistent responses. It also presents PSYDIAL, a Korean dataset emphasizing Extraversion created via a five-stage process (Personality Setting, Profile Selecting, Dialogue Generation, Dialogue Filtering, Dialogue Regeneration). Empirical results show that models trained on PSYDIAL better reflect personality traits than baselines, with system personality settings boosting performance up to 88% in Personality Accuracy. The approach is broadly applicable across languages and tasks and holds promise for more nuanced, personality-driven conversational AI in Korean and beyond; the code is publicly available.

Abstract

We present a novel end-to-end personality-based synthetic dialogue data generation pipeline, specifically designed to elicit responses from large language models via prompting. We design the prompts to generate more human-like dialogues considering real-world scenarios when users engage with chatbots. We introduce PSYDIAL, the first Korean dialogue dataset focused on personality-based dialogues, curated using our proposed pipeline. Notably, we focus on the Extraversion dimension of the Big Five personality model in our research. Experimental results indicate that while pre-trained models and those fine-tuned with a chit-chat dataset struggle to generate responses reflecting personality, models trained with PSYDIAL show significant improvements. The versatility of our pipeline extends beyond dialogue tasks, offering potential for other non-dialogue related applications. This research opens doors for more nuanced, personality-driven conversational AI in Korean and potentially other languages. Our code is publicly available at https://github.com/jiSilverH/psydial.

PSYDIAL: Personality-based Synthetic Dialogue Generation using Large Language Models

TL;DR

PSYDIAL introduces an end-to-end, personality-based synthetic dialogue data generation pipeline that leverages prompting of Large Language Models to elicit personality-consistent responses. It also presents PSYDIAL, a Korean dataset emphasizing Extraversion created via a five-stage process (Personality Setting, Profile Selecting, Dialogue Generation, Dialogue Filtering, Dialogue Regeneration). Empirical results show that models trained on PSYDIAL better reflect personality traits than baselines, with system personality settings boosting performance up to 88% in Personality Accuracy. The approach is broadly applicable across languages and tasks and holds promise for more nuanced, personality-driven conversational AI in Korean and beyond; the code is publicly available.

Abstract

We present a novel end-to-end personality-based synthetic dialogue data generation pipeline, specifically designed to elicit responses from large language models via prompting. We design the prompts to generate more human-like dialogues considering real-world scenarios when users engage with chatbots. We introduce PSYDIAL, the first Korean dialogue dataset focused on personality-based dialogues, curated using our proposed pipeline. Notably, we focus on the Extraversion dimension of the Big Five personality model in our research. Experimental results indicate that while pre-trained models and those fine-tuned with a chit-chat dataset struggle to generate responses reflecting personality, models trained with PSYDIAL show significant improvements. The versatility of our pipeline extends beyond dialogue tasks, offering potential for other non-dialogue related applications. This research opens doors for more nuanced, personality-driven conversational AI in Korean and potentially other languages. Our code is publicly available at https://github.com/jiSilverH/psydial.
Paper Structure (34 sections, 1 equation, 3 figures, 6 tables)

This paper contains 34 sections, 1 equation, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Overview of the proposed data generation pipeline.
  • Figure 2: Text embeddings during Dialogue Filtering stage. Left: text embeddings before applying Dialogue Filtering, Right: text embeddings after applying Dialogue Filtering
  • Figure 3: Generated dialog sample