Table of Contents
Fetching ...

Persona-Aware Alignment Framework for Personalized Dialogue Generation

Guanrong Li, Xinyu Liu, Zhen Wu, Xinyu Dai

TL;DR

This paper tackles persona-consistent dialogue generation by addressing the inadequacy of token-level training to capture user personas. It introduces the Persona-Aware Alignment Framework (PAL), a two-stage training scheme consisting of Persona-aware Learning and Persona Alignment, complemented by a Select-then-Generate inference strategy to improve semantic persona alignment. The approach jointly learns which persona is relevant and how to generate persona-aware responses, then directly optimizes alignment with given personas using Direct Preference Optimization (DPO) on constructed golden/Generated pairs. Across English and Chinese datasets and multiple foundation models, PAL yields significant gains over state-of-the-art baselines and even several closed-source LLMs, demonstrating strong generalizability and practical impact for personalized dialogue systems.

Abstract

Personalized dialogue generation aims to leverage persona profiles and dialogue history to generate persona-relevant and consistent responses. Mainstream models typically rely on token-level language model training with persona dialogue data, such as Next Token Prediction, to implicitly achieve personalization, making these methods tend to neglect the given personas and generate generic responses. To address this issue, we propose a novel Persona-Aware Alignment Framework (PAL), which directly treats persona alignment as the training objective of dialogue generation. Specifically, PAL employs a two-stage training method including Persona-aware Learning and Persona Alignment, equipped with an easy-to-use inference strategy Select then Generate, to improve persona sensitivity and generate more persona-relevant responses at the semantics level. Through extensive experiments, we demonstrate that our framework outperforms many state-of-the-art personalized dialogue methods and large language models.

Persona-Aware Alignment Framework for Personalized Dialogue Generation

TL;DR

This paper tackles persona-consistent dialogue generation by addressing the inadequacy of token-level training to capture user personas. It introduces the Persona-Aware Alignment Framework (PAL), a two-stage training scheme consisting of Persona-aware Learning and Persona Alignment, complemented by a Select-then-Generate inference strategy to improve semantic persona alignment. The approach jointly learns which persona is relevant and how to generate persona-aware responses, then directly optimizes alignment with given personas using Direct Preference Optimization (DPO) on constructed golden/Generated pairs. Across English and Chinese datasets and multiple foundation models, PAL yields significant gains over state-of-the-art baselines and even several closed-source LLMs, demonstrating strong generalizability and practical impact for personalized dialogue systems.

Abstract

Personalized dialogue generation aims to leverage persona profiles and dialogue history to generate persona-relevant and consistent responses. Mainstream models typically rely on token-level language model training with persona dialogue data, such as Next Token Prediction, to implicitly achieve personalization, making these methods tend to neglect the given personas and generate generic responses. To address this issue, we propose a novel Persona-Aware Alignment Framework (PAL), which directly treats persona alignment as the training objective of dialogue generation. Specifically, PAL employs a two-stage training method including Persona-aware Learning and Persona Alignment, equipped with an easy-to-use inference strategy Select then Generate, to improve persona sensitivity and generate more persona-relevant responses at the semantics level. Through extensive experiments, we demonstrate that our framework outperforms many state-of-the-art personalized dialogue methods and large language models.

Paper Structure

This paper contains 28 sections, 8 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: An example for the personalized dialogue. ✓ represents the persona consistent response and ✗ represents a generic response instead of a personal response.
  • Figure 2: The overview of our Persona-Aware Alignment Framework (PAL) includes a two-stage training strategy: (1) Persona-aware Learning, and (2) Persona Alignment, as well as a Select then Generate inference strategy. The arrows trace the flow of information, showing how each stage converts its inputs into outputs. In the Persona-aware Learning stage, the inputs are the persona descriptions $P^i$ and dialogue context $C^i$. A multi-task prompt-construction module turns these inputs into training samples $s^{i,k}$, which are used for prompt tuning, with the training loss $\mathcal{L}_{MT}$ as the output of this stage. In the Persona Alignment stage, the inputs include persona descriptions $P^i$, dialogue context $C^i$, and the gold response $r^i_{gold}$. An alignment-pair constructor forms $(r^{i}_{\text{gold}}, r^{i}_{\text{gen}})$, where $r^{i}_{\text{gen}}$ is produced by the model from the previous stage. These pairs yield the alignment loss $\mathcal{L}_{PA}$, the output of this stage by Alignment Training. In the Select-then-Generate inference strategy, the inputs are the persona descriptions $P^i$ and dialogue context $C^i$. A selection module picks the persona $\hat{p}^{i}$ most relevant to the context. A response generator then produces the final reply $r$, explicitly highlighting the selected persona.
  • Figure 3: The Results of Different Inference Strategies on the Original PERSONA-CHAT Dataset.
  • Figure 4: Influence of Persona Alignment Training Steps on the PERSONA-CHAT Dataset.