Table of Contents
Fetching ...

Template-assisted Contrastive Learning of Task-oriented Dialogue Sentence Embeddings

Minsik Oh, Jiwei Li, Guoyin Wang

TL;DR

This work addresses the challenge of learning effective sentence embeddings for task-oriented dialogue by leveraging token-level template information. It introduces TaDSE, which combines template-based data augmentation, utterance-template pairwise contrastive learning, and a semantic compression inference to align utterances with their semantic templates. Across five benchmark dialogue datasets, TaDSE yields significant improvements over state-of-the-art unsupervised methods, with analyses linking semantic compression to uniformity and alignment in the embedding space. The approach highlights the value of incorporating template and slot-level knowledge into dialogue representations and provides new tools for diagnosing and understanding embedding structure.

Abstract

Learning high quality sentence embeddings from dialogues has drawn increasing attentions as it is essential to solve a variety of dialogue-oriented tasks with low annotation cost. Annotating and gathering utterance relationships in conversations are difficult, while token-level annotations, \eg, entities, slots and templates, are much easier to obtain. Other sentence embedding methods are usually sentence-level self-supervised frameworks and cannot utilize token-level extra knowledge. We introduce Template-aware Dialogue Sentence Embedding (TaDSE), a novel augmentation method that utilizes template information to learn utterance embeddings via self-supervised contrastive learning framework. We further enhance the effect with a synthetically augmented dataset that diversifies utterance-template association, in which slot-filling is a preliminary step. We evaluate TaDSE performance on five downstream benchmark dialogue datasets. The experiment results show that TaDSE achieves significant improvements over previous SOTA methods for dialogue. We further introduce a novel analytic instrument of semantic compression test, for which we discover a correlation with uniformity and alignment. Our code will be released upon acceptance.

Template-assisted Contrastive Learning of Task-oriented Dialogue Sentence Embeddings

TL;DR

This work addresses the challenge of learning effective sentence embeddings for task-oriented dialogue by leveraging token-level template information. It introduces TaDSE, which combines template-based data augmentation, utterance-template pairwise contrastive learning, and a semantic compression inference to align utterances with their semantic templates. Across five benchmark dialogue datasets, TaDSE yields significant improvements over state-of-the-art unsupervised methods, with analyses linking semantic compression to uniformity and alignment in the embedding space. The approach highlights the value of incorporating template and slot-level knowledge into dialogue representations and provides new tools for diagnosing and understanding embedding structure.

Abstract

Learning high quality sentence embeddings from dialogues has drawn increasing attentions as it is essential to solve a variety of dialogue-oriented tasks with low annotation cost. Annotating and gathering utterance relationships in conversations are difficult, while token-level annotations, \eg, entities, slots and templates, are much easier to obtain. Other sentence embedding methods are usually sentence-level self-supervised frameworks and cannot utilize token-level extra knowledge. We introduce Template-aware Dialogue Sentence Embedding (TaDSE), a novel augmentation method that utilizes template information to learn utterance embeddings via self-supervised contrastive learning framework. We further enhance the effect with a synthetically augmented dataset that diversifies utterance-template association, in which slot-filling is a preliminary step. We evaluate TaDSE performance on five downstream benchmark dialogue datasets. The experiment results show that TaDSE achieves significant improvements over previous SOTA methods for dialogue. We further introduce a novel analytic instrument of semantic compression test, for which we discover a correlation with uniformity and alignment. Our code will be released upon acceptance.
Paper Structure (24 sections, 7 equations, 11 figures, 12 tables)

This paper contains 24 sections, 7 equations, 11 figures, 12 tables.

Figures (11)

  • Figure 1: Embedding hyperspace changes with our method, from (a), (b) to (c). Ellipses denote sentence representations from the dataset, belonging to unique semantic groups. (a) shows the limited original data, (b) shows the effect of noisy data augmentation in which semantic clusters overlap, and (c) shows enhanced semantic group separation with our methods, with templates within each semantic group to constrain the embeddings.
  • Figure 2: Our template contrastive learning methods. The first diagram displays template contrastive learning ($L^t$), second diagram displays utterance contrastive learning ($L^u$), and the third diagram displays pairwise contrastive learning ($L^\textit{pair}$). Encoder represents the embedding generation model and yellow, and green represent template and utterance representations respectively. Solid bidirectional arrows designate positive pairs and dashed bidirectional arrows designate negative pairs.
  • Figure 3: Our template data augmentation process in a simplified example, with a single template. In practice, thousands of templates and slot values exist per dataset (Table \ref{['tab:dataset_aug']}). We experiment with both manual annotations and automated slot-filling method.
  • Figure 4: Our embedding generation process. Blue and red dashed lines are examples of positive and negative pairs for $L^{\textit{pair}}$ loss. Dashed arrows depict an alternative choice of semantic compression inference technique. Slot-filling baselines and template sources described in Section \ref{['sec:data']}.
  • Figure 5: T-SNE diagram for SNIPS models, left : SimCSE, middle : TaDSE, right : TaDSE-compressed $0.5$. Embeddings are color-coded according to their labels, with red, blue colored embeddings being representations with PlayMusic, AddToPlaylist labels. We circle the increased sparcity near the effective decision boundaries and show a magnified view at lower right. Note that more compression does not always result in better performance. ATIS diagrams in Fig. \ref{['fig:atis_base']}, \ref{['fig:atis_train']}, \ref{['fig:atis_optimal']}.
  • ...and 6 more figures