Table of Contents
Fetching ...

New Intent Discovery with Pre-training and Contrastive Learning

Yuwei Zhang, Haode Zhang, Li-Ming Zhan, Albert Y. S. Lam, Xiao-Ming Wu

TL;DR

This paper addresses new intent discovery (NID), aiming to identify novel intents from unlabeled user utterances to extend predefined intents. It proposes a two-stage framework: Stage 1 multi-task pre-training (MTP) leverages external labeled datasets and internal unlabeled data to learn task-aware utterance representations, and Stage 2 neighborhood-aware contrastive learning (CLNN) uses nearest neighbors to produce compact embeddings suitable for clustering. Empirical results on three benchmarks show that MTP significantly outperforms baselines in both unsupervised and semi-supervised NID, and CLNN provides additional gains, achieving state-of-the-art performance while reducing reliance on domain-specific labels. The approach offers practical value for dialogue systems by enabling effective knowledge transfer with limited labeled data and robust clustering of unknown intents.

Abstract

New intent discovery aims to uncover novel intent categories from user utterances to expand the set of supported intent classes. It is a critical task for the development and service expansion of a practical dialogue system. Despite its importance, this problem remains under-explored in the literature. Existing approaches typically rely on a large amount of labeled utterances and employ pseudo-labeling methods for representation learning and clustering, which are label-intensive, inefficient, and inaccurate. In this paper, we provide new solutions to two important research questions for new intent discovery: (1) how to learn semantic utterance representations and (2) how to better cluster utterances. Particularly, we first propose a multi-task pre-training strategy to leverage rich unlabeled data along with external labeled data for representation learning. Then, we design a new contrastive loss to exploit self-supervisory signals in unlabeled data for clustering. Extensive experiments on three intent recognition benchmarks demonstrate the high effectiveness of our proposed method, which outperforms state-of-the-art methods by a large margin in both unsupervised and semi-supervised scenarios. The source code will be available at https://github.com/zhang-yu-wei/MTP-CLNN.

New Intent Discovery with Pre-training and Contrastive Learning

TL;DR

This paper addresses new intent discovery (NID), aiming to identify novel intents from unlabeled user utterances to extend predefined intents. It proposes a two-stage framework: Stage 1 multi-task pre-training (MTP) leverages external labeled datasets and internal unlabeled data to learn task-aware utterance representations, and Stage 2 neighborhood-aware contrastive learning (CLNN) uses nearest neighbors to produce compact embeddings suitable for clustering. Empirical results on three benchmarks show that MTP significantly outperforms baselines in both unsupervised and semi-supervised NID, and CLNN provides additional gains, achieving state-of-the-art performance while reducing reliance on domain-specific labels. The approach offers practical value for dialogue systems by enabling effective knowledge transfer with limited labeled data and robust clustering of unknown intents.

Abstract

New intent discovery aims to uncover novel intent categories from user utterances to expand the set of supported intent classes. It is a critical task for the development and service expansion of a practical dialogue system. Despite its importance, this problem remains under-explored in the literature. Existing approaches typically rely on a large amount of labeled utterances and employ pseudo-labeling methods for representation learning and clustering, which are label-intensive, inefficient, and inaccurate. In this paper, we provide new solutions to two important research questions for new intent discovery: (1) how to learn semantic utterance representations and (2) how to better cluster utterances. Particularly, we first propose a multi-task pre-training strategy to leverage rich unlabeled data along with external labeled data for representation learning. Then, we design a new contrastive loss to exploit self-supervisory signals in unlabeled data for clustering. Extensive experiments on three intent recognition benchmarks demonstrate the high effectiveness of our proposed method, which outperforms state-of-the-art methods by a large margin in both unsupervised and semi-supervised scenarios. The source code will be available at https://github.com/zhang-yu-wei/MTP-CLNN.
Paper Structure (16 sections, 3 equations, 7 figures, 7 tables)

This paper contains 16 sections, 3 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: New Intent Discovery.
  • Figure 2: The left part shows the overall workflow of our method where the training order is indicated by the red arrow. The datasets and corresponding loss functions used in each training stage are indicated by the black arrows. The right part illustrates a simple example of CLNN. A batch of four training instances $\{x_i\}_{i=1}^4$ (solid markers) and their respective neighborhoods $\{\mathcal{N}_i\}_{i=1}^4$ are plotted (hollow markers within large circles). Since $x_2$ falls within $\mathcal{N}_1$, $x_2$ along with its neighbors are taken as positive instance for $x_1$ (but not vice versa since $x_1$ is not in $\mathcal{N}_2$). We also show an example of adjacency matrix $\bf {A}^{\prime}$ and augmented batch $\mathcal{B}^{\prime}$. The pairwise relationships with the first instance in the batch are plotted with solid lines indicating positive pairs and dashed lines indicating negative pairs.
  • Figure 3: Visulization of embeddings on StackOverflow. $\text{KCR}=25\%$, $\text{LAR}=10\%$. Best viewed in color.
  • Figure 4: Ablation study on the effectiveness of MTP. The $\text{LAR}$ is set to 10%. SUP stands for supervised pre-training on internal labeled data only. The three columns correspond to results in the three metrics respectively.
  • Figure 5: Analysis on the number of nearest neighbors in CLNN for unsupervised NID. Vertical dashed lines correspond to our empirical estimations of optima. Horizontal dashed lines represent the results of only training with MTP. When the number of nearest neighbors is $0$, we simply augment the same instance twice as in conventional contrastive learning chen2020simple. The three columns correspond to results in the three metrics respectively.
  • ...and 2 more figures