Table of Contents
Fetching ...

BootTOD: Bootstrap Task-oriented Dialogue Representations by Aligning Diverse Responses

Weihao Zeng, Keqing He, Yejie Wang, Dayuan Fu, Weiran Xu

TL;DR

Experimental results show that BootTOD outperforms strong TOD baselines on diverse downstream dialogue tasks and uses multiple appropriate response targets to model the intrinsic one-to-many diversity of human conversations.

Abstract

Pre-trained language models have been successful in many scenarios. However, their usefulness in task-oriented dialogues is limited due to the intrinsic linguistic differences between general text and task-oriented dialogues. Current task-oriented dialogue pre-training methods rely on a contrastive framework, which faces challenges such as selecting true positives and hard negatives, as well as lacking diversity. In this paper, we propose a novel dialogue pre-training model called BootTOD. It learns task-oriented dialogue representations via a self-bootstrapping framework. Unlike contrastive counterparts, BootTOD aligns context and context+response representations and dismisses the requirements of contrastive pairs. BootTOD also uses multiple appropriate response targets to model the intrinsic one-to-many diversity of human conversations. Experimental results show that BootTOD outperforms strong TOD baselines on diverse downstream dialogue tasks.

BootTOD: Bootstrap Task-oriented Dialogue Representations by Aligning Diverse Responses

TL;DR

Experimental results show that BootTOD outperforms strong TOD baselines on diverse downstream dialogue tasks and uses multiple appropriate response targets to model the intrinsic one-to-many diversity of human conversations.

Abstract

Pre-trained language models have been successful in many scenarios. However, their usefulness in task-oriented dialogues is limited due to the intrinsic linguistic differences between general text and task-oriented dialogues. Current task-oriented dialogue pre-training methods rely on a contrastive framework, which faces challenges such as selecting true positives and hard negatives, as well as lacking diversity. In this paper, we propose a novel dialogue pre-training model called BootTOD. It learns task-oriented dialogue representations via a self-bootstrapping framework. Unlike contrastive counterparts, BootTOD aligns context and context+response representations and dismisses the requirements of contrastive pairs. BootTOD also uses multiple appropriate response targets to model the intrinsic one-to-many diversity of human conversations. Experimental results show that BootTOD outperforms strong TOD baselines on diverse downstream dialogue tasks.
Paper Structure (15 sections, 3 equations, 4 figures, 9 tables)

This paper contains 15 sections, 3 equations, 4 figures, 9 tables.

Figures (4)

  • Figure 1: The same context may have multiple appropriate responses in a task-oriented dialogue.
  • Figure 2: Overall architecture of our proposed BootTOD.
  • Figure 3: Ablation study of Alignment Layers. We report the results of dialogue act prediction on DSTC2. The X-asix and Y-asix denotes the number of layers used for alignment and performance.
  • Figure 4: Ablation study of max future length $P$. We report the results of dialogue act prediction on DSTC2. The X-asix and Y-asix denotes the max future length $P$ and performance.