Table of Contents
Fetching ...

Instruct Once, Chat Consistently in Multiple Rounds: An Efficient Tuning Framework for Dialogue

Jian Wang, Chak Tou Leong, Jiashuo Wang, Dongding Lin, Wenjie Li, Xiao-Yong Wei

TL;DR

This work tackles the challenge of maintaining multi-round dialogue consistency by reframing tuning as a role-aware, interactive process. It introduces Midi-Tuning, which uses two LoRA-based adapters to separately model the agent and user, coupled with a round-level memory caching mechanism to efficiently preserve prior context. Across Light and TopDial datasets and multiple 7B-sized backbones, Midi-Tuning shows superior consistency and comparable overall quality versus standard fine-tuning, as validated by automatic metrics, GPT-4 assessments, and human judgments. The approach demonstrates the practical impact of aligning tuning strategies with the inherently interactive, role-based nature of dialogue, while noting limitations in padding overhead and compute efficiency that future work can address.

Abstract

Tuning language models for dialogue generation has been a prevalent paradigm for building capable dialogue agents. Yet, traditional tuning narrowly views dialogue generation as resembling other language generation tasks, ignoring the role disparities between two speakers and the multi-round interactive process that dialogues ought to be. Such a manner often leads to unsatisfactory chat consistency for the built agent. In this work, we emphasize the interactive, communicative nature of dialogue and argue that it is more feasible to model the speaker roles of agent and user separately, enabling the agent to adhere to its role consistently. With this in mind, we propose an efficient Multi-round Interactive Dialogue Tuning (Midi-Tuning) framework. It models the agent and user individually with two adapters built upon large language models. The adapters make use of respective utterances round by round in alternating order and they are tuned via a round-level memory caching mechanism. Extensive experiments demonstrate that, our framework performs superior to traditional fine-tuning and harbors the tremendous potential for improving dialogue consistency.

Instruct Once, Chat Consistently in Multiple Rounds: An Efficient Tuning Framework for Dialogue

TL;DR

This work tackles the challenge of maintaining multi-round dialogue consistency by reframing tuning as a role-aware, interactive process. It introduces Midi-Tuning, which uses two LoRA-based adapters to separately model the agent and user, coupled with a round-level memory caching mechanism to efficiently preserve prior context. Across Light and TopDial datasets and multiple 7B-sized backbones, Midi-Tuning shows superior consistency and comparable overall quality versus standard fine-tuning, as validated by automatic metrics, GPT-4 assessments, and human judgments. The approach demonstrates the practical impact of aligning tuning strategies with the inherently interactive, role-based nature of dialogue, while noting limitations in padding overhead and compute efficiency that future work can address.

Abstract

Tuning language models for dialogue generation has been a prevalent paradigm for building capable dialogue agents. Yet, traditional tuning narrowly views dialogue generation as resembling other language generation tasks, ignoring the role disparities between two speakers and the multi-round interactive process that dialogues ought to be. Such a manner often leads to unsatisfactory chat consistency for the built agent. In this work, we emphasize the interactive, communicative nature of dialogue and argue that it is more feasible to model the speaker roles of agent and user separately, enabling the agent to adhere to its role consistently. With this in mind, we propose an efficient Multi-round Interactive Dialogue Tuning (Midi-Tuning) framework. It models the agent and user individually with two adapters built upon large language models. The adapters make use of respective utterances round by round in alternating order and they are tuned via a round-level memory caching mechanism. Extensive experiments demonstrate that, our framework performs superior to traditional fine-tuning and harbors the tremendous potential for improving dialogue consistency.
Paper Structure (39 sections, 5 equations, 12 figures, 7 tables)

This paper contains 39 sections, 5 equations, 12 figures, 7 tables.

Figures (12)

  • Figure 1: Comparison of different tuning manners (including data usage) for dialogue generation.
  • Figure 2: Overview of the proposed Multi-round Interactive Dialogue Tuning (Midi-Tuning) framework.
  • Figure 3: Overview of the round-level memory caching.
  • Figure 4: Performance of the created consistency estimator on the Light validation set.
  • Figure 5: Per-round consistency comparison between the fine-tuning (FT) and Midi-Tuning (Ours) on the Light test-unseen set.
  • ...and 7 more figures