Recent Trends in Personalized Dialogue Generation: A Review of Datasets, Methodologies, and Evaluations

Yi-Pei Chen; Noriki Nishida; Hideki Nakayama; Yuji Matsumoto

Recent Trends in Personalized Dialogue Generation: A Review of Datasets, Methodologies, and Evaluations

Yi-Pei Chen, Noriki Nishida, Hideki Nakayama, Yuji Matsumoto

TL;DR

The paper addresses the challenge of personalized dialogue generation by systematically surveying datasets, methodologies, and evaluations. It formalizes the problem as conditional generation $p(R|D,P)$ in a bilateral agent–user setup and categorizes methodological approaches into five families (consistency/coherence, persona–context balancing, relevant persona selection, unknown persona modeling, and data scarcity), with additional discussion on LLMs and in-context learning. Key contributions include a taxonomy of 22 datasets (highlighting PersonaChat, grounding, and multilingual extensions), a synthesis of 17 recent works (2021–2023) across major conferences, and a critical evaluation of evaluation metrics and standardization needs. The findings underscore challenges in dataset size, quality, and diversity; the need to model both agent and user personas; and the imperative to develop standardized benchmarks for fair, reproducible assessment, ultimately guiding future research toward more robust, contextually aware, and scalable personalized dialogue systems.

Abstract

Enhancing user engagement through personalization in conversational agents has gained significance, especially with the advent of large language models that generate fluent responses. Personalized dialogue generation, however, is multifaceted and varies in its definition -- ranging from instilling a persona in the agent to capturing users' explicit and implicit cues. This paper seeks to systemically survey the recent landscape of personalized dialogue generation, including the datasets employed, methodologies developed, and evaluation metrics applied. Covering 22 datasets, we highlight benchmark datasets and newer ones enriched with additional features. We further analyze 17 seminal works from top conferences between 2021-2023 and identify five distinct types of problems. We also shed light on recent progress by LLMs in personalized dialogue generation. Our evaluation section offers a comprehensive summary of assessment facets and metrics utilized in these works. In conclusion, we discuss prevailing challenges and envision prospect directions for future research in personalized dialogue generation.

Recent Trends in Personalized Dialogue Generation: A Review of Datasets, Methodologies, and Evaluations

TL;DR

The paper addresses the challenge of personalized dialogue generation by systematically surveying datasets, methodologies, and evaluations. It formalizes the problem as conditional generation

in a bilateral agent–user setup and categorizes methodological approaches into five families (consistency/coherence, persona–context balancing, relevant persona selection, unknown persona modeling, and data scarcity), with additional discussion on LLMs and in-context learning. Key contributions include a taxonomy of 22 datasets (highlighting PersonaChat, grounding, and multilingual extensions), a synthesis of 17 recent works (2021–2023) across major conferences, and a critical evaluation of evaluation metrics and standardization needs. The findings underscore challenges in dataset size, quality, and diversity; the need to model both agent and user personas; and the imperative to develop standardized benchmarks for fair, reproducible assessment, ultimately guiding future research toward more robust, contextually aware, and scalable personalized dialogue systems.

Abstract

Paper Structure (28 sections, 1 figure, 5 tables)

This paper contains 28 sections, 1 figure, 5 tables.

Introduction
Datasets
Datasets Review
Facets
Persona Representation
Domain and Language Biases
Methodology
Problem Statement
Approaches
Consistency and Coherence
Persona-Context Balancing
Relevant Persona Selection
Unknown Persona Modeling
Data Scarcity
Large Language Models and In-Context Learning
...and 13 more sections

Figures (1)

Figure 1: An overview of personalized dialogue generation. Assumed that the conversation is performed by two speakers, i.e., an agent $A$ and a user $U$, the goal is to generate the response $R$ given the dialogue context $C$ or the last utterance $Q$, plus the persona of the agent or user ($P_A$ or $P_U$) (explicit), or utterance histories of them ($H_A$ or $H_U$) (implicit).

Recent Trends in Personalized Dialogue Generation: A Review of Datasets, Methodologies, and Evaluations

TL;DR

Abstract

Recent Trends in Personalized Dialogue Generation: A Review of Datasets, Methodologies, and Evaluations

Authors

TL;DR

Abstract

Table of Contents

Figures (1)