Table of Contents
Fetching ...

Can Your Model Tell a Negation from an Implicature? Unravelling Challenges With Intent Encoders

Yuwei Zhang, Siffi Singh, Sailik Sengupta, Igor Shalyminov, Hang Su, Hwanjun Song, Saab Mansour

TL;DR

This work proposes an intent semantic toolkit that gives a more holistic view of intent embedding models by considering three tasks-- intent classification, intent clustering, and a novel triplet task and improves the semantic understanding of the intent embedding model on the aforementioned linguistic dimensions while slightly effecting their performance on downstream task metrics.

Abstract

Conversational systems often rely on embedding models for intent classification and intent clustering tasks. The advent of Large Language Models (LLMs), which enable instructional embeddings allowing one to adjust semantics over the embedding space using prompts, are being viewed as a panacea for these downstream conversational tasks. However, traditional evaluation benchmarks rely solely on task metrics that don't particularly measure gaps related to semantic understanding. Thus, we propose an intent semantic toolkit that gives a more holistic view of intent embedding models by considering three tasks -- (1) intent classification, (2) intent clustering, and (3) a novel triplet task. The triplet task gauges the model's understanding of two semantic concepts paramount in real-world conversational systems -- negation and implicature. We observe that current embedding models fare poorly in semantic understanding of these concepts. To address this, we propose a pre-training approach to improve the embedding model by leveraging augmentation with data generated by an auto-regressive model and a contrastive loss term. Our approach improves the semantic understanding of the intent embedding model on the aforementioned linguistic dimensions while slightly effecting their performance on downstream task metrics.

Can Your Model Tell a Negation from an Implicature? Unravelling Challenges With Intent Encoders

TL;DR

This work proposes an intent semantic toolkit that gives a more holistic view of intent embedding models by considering three tasks-- intent classification, intent clustering, and a novel triplet task and improves the semantic understanding of the intent embedding model on the aforementioned linguistic dimensions while slightly effecting their performance on downstream task metrics.

Abstract

Conversational systems often rely on embedding models for intent classification and intent clustering tasks. The advent of Large Language Models (LLMs), which enable instructional embeddings allowing one to adjust semantics over the embedding space using prompts, are being viewed as a panacea for these downstream conversational tasks. However, traditional evaluation benchmarks rely solely on task metrics that don't particularly measure gaps related to semantic understanding. Thus, we propose an intent semantic toolkit that gives a more holistic view of intent embedding models by considering three tasks -- (1) intent classification, (2) intent clustering, and (3) a novel triplet task. The triplet task gauges the model's understanding of two semantic concepts paramount in real-world conversational systems -- negation and implicature. We observe that current embedding models fare poorly in semantic understanding of these concepts. To address this, we propose a pre-training approach to improve the embedding model by leveraging augmentation with data generated by an auto-regressive model and a contrastive loss term. Our approach improves the semantic understanding of the intent embedding model on the aforementioned linguistic dimensions while slightly effecting their performance on downstream task metrics.
Paper Structure (24 sections, 5 equations, 6 figures, 9 tables, 1 algorithm)

This paper contains 24 sections, 5 equations, 6 figures, 9 tables, 1 algorithm.

Figures (6)

  • Figure 1: Intent Semantics Toolkit (top) to benchmark embedding model semantic concepts capabilities and training pipeline (bottom) to synthesize training data for improved semantic concepts understanding. For benchmarking data, we prompt the LLM to generate negation and implicature from original utterances, which are validated by both automatic and manual quality control. For training data, we first extract intents from unlabeled utterances, then, we generate hard examples using the LLM, which will be combined with retrieved utterances for fine-tuning.
  • Figure 2: The instructor-large embeds original utterances further away from semantically similar implicature utterances and closer to semantically dissimilar negation utterances, a failure mode for the triplet task (as seen in the tSNE projection space).
  • Figure 3: Similarity metrics between the training data and the original, and (generated) negation and implicature test splits. Averaged results show the negation utterances are closer to the original ones on surface form than the implicature ones.
  • Figure 4: Pearson correlations between tasks using performances on dev set as a feature vector for each task.
  • Figure 5: Intent extraction pipeline. We first prompt the LLM for generating the user goal from the utterance. We then find the action-object pair from the generated goal. For those that can not find objects during second step, we will further summarize the object with LLM. And finally, we will save the goal and action-object pair for further generation.
  • ...and 1 more figures