Table of Contents
Fetching ...

MoCoRP: Modeling Consistent Relations between Persona and Response for Persona-based Dialogue

Kyungro Lee, Dongha Choi, Hyunju Lee

TL;DR

MoCoRP tackles the lack of explicit relations between persona sentences and responses in persona-based dialogue by introducing an NLI expert to predict entailment/neutral/contradiction relations and integrating these relations into both BART and LLMs through a two-stage training process. The framework includes relation learning on NLI data and dialogue learning that fuses relation vectors with decoder inputs, plus an extension to LLMs using alignment tuning, SFT, and DPO. Empirical results on ConvAI2 and MPChat show improvements in persona consistency and engagingness, with qualitative evaluations corroborating stronger grounding in persona and context. Overall, MoCoRP demonstrates that explicit relation modeling between persona and response enhances coherent, persona-grounded dialogue across model scales, aided by public code release.

Abstract

As dialogue systems become increasingly important across various domains, a key challenge in persona-based dialogue is generating engaging and context-specific interactions while ensuring the model acts with a coherent personality. However, existing persona-based dialogue datasets lack explicit relations between persona sentences and responses, which makes it difficult for models to effectively capture persona information. To address these issues, we propose MoCoRP (Modeling Consistent Relations between Persona and Response), a framework that incorporates explicit relations into language models. MoCoRP leverages an NLI expert to explicitly extract the NLI relations between persona sentences and responses, enabling the model to effectively incorporate appropriate persona information from the context into its responses. We applied this framework to pre-trained models like BART and further extended it to modern large language models (LLMs) through alignment tuning. Experimental results on the public datasets ConvAI2 and MPChat demonstrate that MoCoRP outperforms existing baselines, achieving superior persona consistency and engaging, context-aware dialogue generation. Furthermore, our model not only excels in quantitative metrics but also shows significant improvements in qualitative aspects. These results highlight the effectiveness of explicitly modeling persona-response relations in persona-based dialogue. The source codes of MoCoRP are available at https://github.com/DMCB-GIST/MoCoRP.

MoCoRP: Modeling Consistent Relations between Persona and Response for Persona-based Dialogue

TL;DR

MoCoRP tackles the lack of explicit relations between persona sentences and responses in persona-based dialogue by introducing an NLI expert to predict entailment/neutral/contradiction relations and integrating these relations into both BART and LLMs through a two-stage training process. The framework includes relation learning on NLI data and dialogue learning that fuses relation vectors with decoder inputs, plus an extension to LLMs using alignment tuning, SFT, and DPO. Empirical results on ConvAI2 and MPChat show improvements in persona consistency and engagingness, with qualitative evaluations corroborating stronger grounding in persona and context. Overall, MoCoRP demonstrates that explicit relation modeling between persona and response enhances coherent, persona-grounded dialogue across model scales, aided by public code release.

Abstract

As dialogue systems become increasingly important across various domains, a key challenge in persona-based dialogue is generating engaging and context-specific interactions while ensuring the model acts with a coherent personality. However, existing persona-based dialogue datasets lack explicit relations between persona sentences and responses, which makes it difficult for models to effectively capture persona information. To address these issues, we propose MoCoRP (Modeling Consistent Relations between Persona and Response), a framework that incorporates explicit relations into language models. MoCoRP leverages an NLI expert to explicitly extract the NLI relations between persona sentences and responses, enabling the model to effectively incorporate appropriate persona information from the context into its responses. We applied this framework to pre-trained models like BART and further extended it to modern large language models (LLMs) through alignment tuning. Experimental results on the public datasets ConvAI2 and MPChat demonstrate that MoCoRP outperforms existing baselines, achieving superior persona consistency and engaging, context-aware dialogue generation. Furthermore, our model not only excels in quantitative metrics but also shows significant improvements in qualitative aspects. These results highlight the effectiveness of explicitly modeling persona-response relations in persona-based dialogue. The source codes of MoCoRP are available at https://github.com/DMCB-GIST/MoCoRP.

Paper Structure

This paper contains 25 sections, 9 equations, 4 figures, 11 tables.

Figures (4)

  • Figure 1: Example of persona-based dialogue from the ConvAI2 dataset. The relations between the model's persona and utterances are represented by blue lines for entailment and red lines for contradiction, while relations not indicated are neutral. In the original dataset, the relations between persona sentences and response are not provided.
  • Figure 2: Overall architecture of the proposed MoCoRP for persona-based dialogue. The NLI expert is trained to predict NLI labels using the Dialogue NLI dataset (A), and BART learns the relation prediction capability from the NLI expert through relation learning and dialogue learning (B). Special tokens are used to structure the input: $[m]$ represents the mask token from the tokenizer, $[q]$ indicates that the following token sequence is a user query, and $[b]$ marks a previous bot utterance. The BART decoder start token, $[r]$, is transformed into a token embedding through the embedding layer. This embedding is then combined with the relation vector $z_{rel}$ to model the relations between the sentences associated with the $[m]$ and the decoder input.
  • Figure 3: Overall architecture of the proposed MoCoRP LLM for persona-based dialogue. The prior LLM is trained to generate the target response based on the given system message and dialogue history (top left). After completing alignment tuning of the prior model, the NLI expert calculates the NLI labels between the persona sentences and the prior response generated by the prior LLM (top right). Using these NLI labels along with the given input, the posterior model learns to generate the target response by maximizing its probability conditioned on the input context (bottom).
  • Figure 4: Input prompts for the prior (top) and posterior (bottom) model in MoCoRP LLM. [SYSTEM], [QUERY], and [BOT] indicate system message, user query, and bot utterance, respectively. A "/" preceding these tokens refers to the end of the corresponding role. The texts in red represents the reasoning process that the posterior model needs to additionally reference, while the green text indicates the NLI relations between the persona sentences and the prior response. Some utterances of the dialogue history are omitted, and the target response is highlighted in blue.