Table of Contents
Fetching ...

Personalized Dialogue Generation with Persona-Adaptive Attention

Qiushi Huang, Yu Zhang, Tom Ko, Xubo Liu, Bo Wu, Wenwu Wang, Lilian Tang

TL;DR

An effective framework with Persona-Adaptive Attention (PAA), which adaptively integrates the weights from the persona and context information via the authors' designed attention, and a dynamic masking mechanism is applied to the PAA to not only drop redundant information in context and persona but also serve as a regularization mechanism to avoid overfitting.

Abstract

Persona-based dialogue systems aim to generate consistent responses based on historical context and predefined persona. Unlike conventional dialogue generation, the persona-based dialogue needs to consider both dialogue context and persona, posing a challenge for coherent training. Specifically, this requires a delicate weight balance between context and persona. To achieve that, in this paper, we propose an effective framework with Persona-Adaptive Attention (PAA), which adaptively integrates the weights from the persona and context information via our designed attention. In addition, a dynamic masking mechanism is applied to the PAA to not only drop redundant information in context and persona but also serve as a regularization mechanism to avoid overfitting. Experimental results demonstrate the superiority of the proposed PAA framework compared to the strong baselines in both automatic and human evaluation. Moreover, the proposed PAA approach can perform equivalently well in a low-resource regime compared to models trained in a full-data setting, which achieve a similar result with only 20% to 30% of data compared to the larger models trained in the full-data setting. To fully exploit the effectiveness of our design, we designed several variants for handling the weighted information in different ways, showing the necessity and sufficiency of our weighting and masking designs.

Personalized Dialogue Generation with Persona-Adaptive Attention

TL;DR

An effective framework with Persona-Adaptive Attention (PAA), which adaptively integrates the weights from the persona and context information via the authors' designed attention, and a dynamic masking mechanism is applied to the PAA to not only drop redundant information in context and persona but also serve as a regularization mechanism to avoid overfitting.

Abstract

Persona-based dialogue systems aim to generate consistent responses based on historical context and predefined persona. Unlike conventional dialogue generation, the persona-based dialogue needs to consider both dialogue context and persona, posing a challenge for coherent training. Specifically, this requires a delicate weight balance between context and persona. To achieve that, in this paper, we propose an effective framework with Persona-Adaptive Attention (PAA), which adaptively integrates the weights from the persona and context information via our designed attention. In addition, a dynamic masking mechanism is applied to the PAA to not only drop redundant information in context and persona but also serve as a regularization mechanism to avoid overfitting. Experimental results demonstrate the superiority of the proposed PAA framework compared to the strong baselines in both automatic and human evaluation. Moreover, the proposed PAA approach can perform equivalently well in a low-resource regime compared to models trained in a full-data setting, which achieve a similar result with only 20% to 30% of data compared to the larger models trained in the full-data setting. To fully exploit the effectiveness of our design, we designed several variants for handling the weighted information in different ways, showing the necessity and sufficiency of our weighting and masking designs.
Paper Structure (36 sections, 14 equations, 5 figures, 7 tables)

This paper contains 36 sections, 14 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: An example from the ConvAI2 dataset.
  • Figure 2: (a) The overview of our framework, including two encoders for persona and context, respectively, and a decoder with PAA to generate a response. (b) The PAA architecture balances the information flows from two sources of input by generating dynamic masks.
  • Figure 3: Comparison with GPT2 under low-resource scenario, we sampled 10% to 90% of training data to train GPT2-SMALL, GPT2-MEDIUM and PAA.
  • Figure 4: The variants of PAA. (a) Dual Weights Attention: the weights for two cross-attention are calculated separately; (b) Skipped Weight Attention: the weight and mask will only apply to persona cross-attention; (c) Context-Adaptive Attention: the weights and masks inferred from context rather than persona cross-attention; (d) DirectSUM directly sums two cross attention; (E) Parametric Attention processes two cross-attention results via a feed-forward network.
  • Figure 5: PyTorch-style Pseudo-code for PAA Training