Table of Contents
Fetching ...

A Stack-Propagation Framework for Low-Resource Personalized Dialogue Generation

Haoyu Song, Wei-Nan Zhang, Kaiyan Zhang, Ting Liu

TL;DR

Under different low-resource settings, subjective and objective evaluations prove that the stack-propagation framework outperforms strong baselines in response quality and persona consistency and largely overcomes the shortcomings of traditional models that rely heavily on the persona-dense dialogue data.

Abstract

With the resurgent interest in building open-domain dialogue systems, the dialogue generation task has attracted increasing attention over the past few years. This task is usually formulated as a conditional generation problem, which aims to generate a natural and meaningful response given dialogue contexts and specific constraints, such as persona. And maintaining a consistent persona is essential for the dialogue systems to gain trust from the users. Although tremendous advancements have been brought, traditional persona-based dialogue models are typically trained by leveraging a large number of persona-dense dialogue examples. Yet, such persona-dense training data are expensive to obtain, leading to a limited scale. This work presents a novel approach to learning from limited training examples by regarding consistency understanding as a regularization of response generation. To this end, we propose a novel stack-propagation framework for learning a generation and understanding pipeline.Specifically, the framework stacks a Transformer encoder and two Transformer decoders, where the first decoder models response generation and the second serves as a regularizer and jointly models response generation and consistency understanding. The proposed framework can benefit from the stacked encoder and decoders to learn from much smaller personalized dialogue data while maintaining competitive performance. Under different low-resource settings, subjective and objective evaluations prove that the stack-propagation framework outperforms strong baselines in response quality and persona consistency and largely overcomes the shortcomings of traditional models that rely heavily on the persona-dense dialogue data.

A Stack-Propagation Framework for Low-Resource Personalized Dialogue Generation

TL;DR

Under different low-resource settings, subjective and objective evaluations prove that the stack-propagation framework outperforms strong baselines in response quality and persona consistency and largely overcomes the shortcomings of traditional models that rely heavily on the persona-dense dialogue data.

Abstract

With the resurgent interest in building open-domain dialogue systems, the dialogue generation task has attracted increasing attention over the past few years. This task is usually formulated as a conditional generation problem, which aims to generate a natural and meaningful response given dialogue contexts and specific constraints, such as persona. And maintaining a consistent persona is essential for the dialogue systems to gain trust from the users. Although tremendous advancements have been brought, traditional persona-based dialogue models are typically trained by leveraging a large number of persona-dense dialogue examples. Yet, such persona-dense training data are expensive to obtain, leading to a limited scale. This work presents a novel approach to learning from limited training examples by regarding consistency understanding as a regularization of response generation. To this end, we propose a novel stack-propagation framework for learning a generation and understanding pipeline.Specifically, the framework stacks a Transformer encoder and two Transformer decoders, where the first decoder models response generation and the second serves as a regularizer and jointly models response generation and consistency understanding. The proposed framework can benefit from the stacked encoder and decoders to learn from much smaller personalized dialogue data while maintaining competitive performance. Under different low-resource settings, subjective and objective evaluations prove that the stack-propagation framework outperforms strong baselines in response quality and persona consistency and largely overcomes the shortcomings of traditional models that rely heavily on the persona-dense dialogue data.

Paper Structure

This paper contains 51 sections, 22 equations, 5 figures, 20 tables.

Figures (5)

  • Figure 1: Examples of personality expressions in dialogue responses. (a): Instance in PersonaChat dataset that shows a consistent personality. (b): A 12-layer GPT-2 finetuned on the consistent PersonaChat dataset still generates an inconsistent response. (c) and (d): Cases of the persona sparsity issue in the social media collected datasets from English Twitter and Chinese Weibo, respectively.
  • Figure 2: (a) The differences between auto-regressive language model and masked language model. When predicting a token, the auto-regressive language model can only attend to the left context, while the masked language model can attend to all tokens. (b) Illustrations of pipeline stacking and stack-propagation. The pipeline way only back-propagates to the task-specific model and does not allow back-propagation between tasks. In contrast, stack-propagation uses a continuous and differentiable link between two tasks, allowing back-propagation from Task B into Task A's model, e.g., from consistency understanding to dialogue generation.
  • Figure 3: An overview of the proposed stack-propagation framework EDU. There are three Transformer blocks: an encoder (denoted as $\mathbb{E}$), a response generation decoder (denoted as $\mathbb{D}$) with causal attention mask, and a consistency understanding regularizer (denoted as $\mathbb{U}$) with model-specific attention mask, which depends on its initialization model. The tasks of personalized dialogue generation and consistency understanding are jointly modeled. Accordingly, two losses will be back-propagated, including a dialogue generation loss and a consistency understanding loss. The dialogue generation loss is negative log-likelihood (NLL) objective and back-propagates from $\mathbb{D}$ and $\mathbb{E}$ to the Transformer embedding layer. The consistency understanding loss is a combination of negative log-likelihood and unlikelihood objectives, where it back-propagates from the understanding regularizer $\mathbb{U}$ through $\mathbb{D}$ and $\mathbb{E}$ to the embedding layer. The architecture design and loss back-propagation strategy together form our stack-propagation framework.
  • Figure 4: Illustrations of different models’ performance (better readability in color) under different amounts of personalized data (PersonaChat). The abscissa represents the amount of training data used. Left: the automatic metric perplexity, the lower, the better. Right: the human evaluation score of persona consistency, i.e., Per.C., the higher, the better.
  • Figure 5: Performance illustrations of EDU$_{\text{BoB}}$ and its ablations (better readability in color) under different personalized data amounts (PersonaChat). The abscissa represents the amount of training data used. Left: the automatic metric perplexity, the lower, the better. Right: the human-evaluated score of persona consistency, i.e., Per.C., the higher, the better.