Table of Contents
Fetching ...

Personalized Dialogue Generation with Diversified Traits

Yinhe Zheng, Guanyi Chen, Minlie Huang, Song Liu, Xuan Zhu

TL;DR

The paper tackles generating personalized dialogue by explicitly modeling diversified personality traits. It introduces PersonalDialog, a large-scale Weibo-based dataset with trait annotations, and a trait fusion module that yields a persona representation to condition decoding via two mechanisms: persona-aware attention and persona-aware bias. Through automatic and manual evaluations, the approach with trait attention and bias (Att + PAB) outperforms baselines and demonstrates the ability to express multiple traits coherently across contexts, while preserving fluency and relevance. The work also provides a principled privacy framework for data collection and opens avenues for sociolinguistic research and scalable, trait-aware dialogue systems with real-world data.

Abstract

Endowing a dialogue system with particular personality traits is essential to deliver more human-like conversations. However, due to the challenge of embodying personality via language expression and the lack of large-scale persona-labeled dialogue data, this research problem is still far from well-studied. In this paper, we investigate the problem of incorporating explicit personality traits in dialogue generation to deliver personalized dialogues. To this end, firstly, we construct PersonalDialog, a large-scale multi-turn dialogue dataset containing various traits from a large number of speakers. The dataset consists of 20.83M sessions and 56.25M utterances from 8.47M speakers. Each utterance is associated with a speaker who is marked with traits like Age, Gender, Location, Interest Tags, etc. Several anonymization schemes are designed to protect the privacy of each speaker. This large-scale dataset will facilitate not only the study of personalized dialogue generation, but also other researches on sociolinguistics or social science. Secondly, to study how personality traits can be captured and addressed in dialogue generation, we propose persona-aware dialogue generation models within the sequence to sequence learning framework. Explicit personality traits (structured by key-value pairs) are embedded using a trait fusion module. During the decoding process, two techniques, namely persona-aware attention and persona-aware bias, are devised to capture and address trait-related information. Experiments demonstrate that our model is able to address proper traits in different contexts. Case studies also show interesting results for this challenging research problem.

Personalized Dialogue Generation with Diversified Traits

TL;DR

The paper tackles generating personalized dialogue by explicitly modeling diversified personality traits. It introduces PersonalDialog, a large-scale Weibo-based dataset with trait annotations, and a trait fusion module that yields a persona representation to condition decoding via two mechanisms: persona-aware attention and persona-aware bias. Through automatic and manual evaluations, the approach with trait attention and bias (Att + PAB) outperforms baselines and demonstrates the ability to express multiple traits coherently across contexts, while preserving fluency and relevance. The work also provides a principled privacy framework for data collection and opens avenues for sociolinguistic research and scalable, trait-aware dialogue systems with real-world data.

Abstract

Endowing a dialogue system with particular personality traits is essential to deliver more human-like conversations. However, due to the challenge of embodying personality via language expression and the lack of large-scale persona-labeled dialogue data, this research problem is still far from well-studied. In this paper, we investigate the problem of incorporating explicit personality traits in dialogue generation to deliver personalized dialogues. To this end, firstly, we construct PersonalDialog, a large-scale multi-turn dialogue dataset containing various traits from a large number of speakers. The dataset consists of 20.83M sessions and 56.25M utterances from 8.47M speakers. Each utterance is associated with a speaker who is marked with traits like Age, Gender, Location, Interest Tags, etc. Several anonymization schemes are designed to protect the privacy of each speaker. This large-scale dataset will facilitate not only the study of personalized dialogue generation, but also other researches on sociolinguistics or social science. Secondly, to study how personality traits can be captured and addressed in dialogue generation, we propose persona-aware dialogue generation models within the sequence to sequence learning framework. Explicit personality traits (structured by key-value pairs) are embedded using a trait fusion module. During the decoding process, two techniques, namely persona-aware attention and persona-aware bias, are devised to capture and address trait-related information. Experiments demonstrate that our model is able to address proper traits in different contexts. Case studies also show interesting results for this challenging research problem.

Paper Structure

This paper contains 36 sections, 8 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: An example dialogue session (translated) in our dataset. Several personality traits are given for each speaker. Words in response are in the same color with the corresponding traits.
  • Figure 2: Overview of personalized dialogue generation model. To obtain the persona representation $\boldsymbol{v}_p$, different traits are integrated by the personality trait fusion component. $\boldsymbol{v}_p$ is then used to generate persona-aware attention weights for computing the context vector, or to produce a persona-aware bias for computing the generation distribution.
  • Figure 3: Statistics of personality traits. (a) Distributions of Age and Gender traits. The red and blue bars correspond to female and male speakers, respectively; (b) Distributions of top 21 frequent Locations (provinces); (c) Word cloud visualization of top 250 frequent Interest Tags (translated). The top 10 frequent tags are "Travel", "Food", "Entertainment", "Funny-humor", "Celebrity", "Music", "Fashion", "Literature", "Video-music" and "Post-90s".
  • Figure 4: Distribution of the activeness level of collected Weibo users.
  • Figure 5: Visualization of trait attention scores for our model (Att. + PAB). The generated response is "来吧来吧,我还在等你" (Come on come on, I am still waiting for you (in Yunnan)). "云南"("Yunnan") is a province in China.